Abstract
In recent years, a rapidly increasing number of applications in practice requires optimizing non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of non-convex functions where one could hope to obtain provable guarantees: functions that are “approximately convex”, i.e. functions for which there exists a convex function f such that for all x, for a fixed value We then want to minimize i.e. output a point such that It is quite natural to conjecture that for fixed the problem gets harder for larger however, the exact dependency of and is not known. In this paper, we significantly improve the known lower bound on as a function of and an algorithm matching this lower bound for a natural class of convex bodies. More precisely, we identify a function T : such that when we can give an algorithm that outputs a point such that within time poly On the other hand, when we also prove an information theoretic lower bound that any algorithm that outputs such a must use super polynomial number of evaluations of