资源论文Adaptive Sampling for SGD by Exploiting Side Information

Adaptive Sampling for SGD by Exploiting Side Information

2020-03-05 | |  76 |   51 |   0

Abstract

This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any sideinformation associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a larger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.

上一篇:PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

下一篇:Recycling Randomness with Structure for Sublinear time Kernel Expansions

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...