资源论文Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

2020-03-19 | |  48 |   31 |   0

Abstract

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, lowmemory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm S ALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that S ALSA significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

上一篇:Shampoo: Preconditioned Stochastic Tensor Optimization

下一篇:Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...