Abstract
We develop a optimization method based on the “Hessian-free” approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn’t limited in applicability to aut encoders, or any specific model class. We also discuss the issue of “pathological curvature” as a possible explanation for the difficulty of deeplearning and how optimization, and our method in particular, effectively deals with it.