Abstract
Although the convergence rates of existing variants of A DAG RAD have a better dependence on the number of iterations under the strong convexity condition, their iteration complexities have a explicitly linear dependence on the dimensionality of the problem. To alleviate this bad dependence, we propose a simple yet novel variant of A DA G RAD for stochastic (weakly) strongly convex optimization. Different from existing variants, the proposed variant (referred to as S ADAG RAD) uses an adaptive restarting scheme in which (i) A DAG RAD serves as a sub-routine and is restarted periodically; (ii) the number of iterations for restarting A DAG RAD depends on the history of learning that incorporates knowledge of the geometry of the data. In addition to the adaptive proximal functions and adaptive number of iterations for restarting, we also develop a variant tha is adaptive to the (implicit) strong convexity from the data, which together makes the proposed algorithm strongly adaptive. In the worst case S ADA G RAD has an O(1/ε) iteration complexity for finding an -optimal solution similar to other vari ants. However, it could enjoy faster convergence and much better dependence on the problem’s dimensionality when stochastic gradients are sparse. Extensive experiments on large-scale data sets demonstrate the efficiency of the proposed algorithms in comparison with several variants of A DAG RAD and stochastic gradient method.