Abstract
Non-convex optimization problems arise frequently in machine learning, including feature selection, structured matrix learning, mixture model ing, and neural network training. We consider the problem of finding critical points of a broad clas of non-convex problems with non-smooth components. We analyze the behavior of two gradientbased methods—namely a sub-gradient method, and a proximal method. Our main results are to establish rates of convergence for general problems, and also exhibit faster rates for sub-analyt functions. As an application of our theory, we obtain a simplification of the popular CCCP algorithm, which retains all the desirable convergence properties of the original method, along with a significantly lower cost per iteration. We illustrate our methods and theory via application to the problems of best subset selection, robust esti mation and shape from shading reconstruction.