资源论文S ADAG RAD: Strongly Adaptive Stochastic Gradient Methods

S ADAG RAD: Strongly Adaptive Stochastic Gradient Methods

2020-03-11 | |  42 |   40 |   0

Abstract

Although the convergence rates of existing variants of A DAG RAD have a better dependence on the number of iterations under the strong convexity condition, their iteration complexities have a explicitly linear dependence on the dimensionality of the problem. To alleviate this bad dependence, we propose a simple yet novel variant of A DA G RAD for stochastic (weakly) strongly convex optimization. Different from existing variants, the proposed variant (referred to as S ADAG RAD) uses an adaptive restarting scheme in which (i) A DAG RAD serves as a sub-routine and is restarted periodically; (ii) the number of iterations for restarting A DAG RAD depends on the history of learning that incorporates knowledge of the geometry of the data. In addition to the adaptive proximal functions and adaptive number of iterations for restarting, we also develop a variant tha is adaptive to the (implicit) strong convexity from the data, which together makes the proposed algorithm strongly adaptive. In the worst case S ADA G RAD has an O(1/ε) iteration complexity for finding an -optimal solution similar to other vari ants. However, it could enjoy faster convergence and much better dependence on the problem’s dimensionality when stochastic gradients are sparse. Extensive experiments on large-scale data sets demonstrate the efficiency of the proposed algorithms in comparison with several variants of A DAG RAD and stochastic gradient method.

上一篇:Probabilistic Boolean Tensor Decomposition

下一篇:Black-Box Variational Inference for Stochastic Differential Equations

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...