Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

资源分类

2020-02-28 |

88 |

55 |

Abstract

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T ) rate. However, for non-smooth problems, the convergence rate with averaging might really be Ω(log(T )/T ), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step suffices to recover the O(1/T ) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.

上一篇：A Dantzig Selector Approach to Temporal Difference Learning

下一篇：Parallelizing Exploration–Exploitation Tradeoffs with Gaussian Process Bandit Optimization

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com