Diving into the shallows: a computational perspective on large-scale shallow learning

资源分类

2020-02-12 |

109 |

114 |

Abstract

Remarkable recent success of deep neural networks has not been easy to analyze theoretically. It has been particularly hard to disentangle relative significance of architecture and optimization in achieving accurate classification on large datasets. On the flip side, shallow methods (such as kernel methods) have encountered obstacles in scaling to large data, despite excellent performance on smaller datasets, and extensive theoretical analysis. Practical methods, such as variants of gradient descent used so successfully in deep learning, seem to perform below par when applied to kernel methods. This difficulty has sometimes been attributed to the limitations of shallow architecture. In this paper we identify a basic limitation in gradient descent-based optimization methods when used in conjunctions with smooth kernels. Our analysis demonstrates that only a vanishingly small fraction of the function space is reachable after a polynomial number of gradient descent iterations. That drastically limits the approximating power of gradient descent leading to over-regularization. The issue is purely algorithmic, persisting even in the limit of infinite data. To address this shortcoming in practice, we introduce EigenPro iteration, a simple and direct preconditioning scheme using a small number of approximately computed eigenvectors. It can also be viewed as learning a kernel optimized for gradient descent. Injecting this small, computationally inexpensive and SGD-compatible, amount of approximate second-order information leads to major improvements in convergence. For large data, this leads to a significant performance boost over the state-of-the-art kernel methods. In particular, we are able to match or improve the results reported in the literature at a small fraction of their computational budget. For complete version of this paper see https://arxiv.org/abs/1703.10622.

上一篇：SchNet: A continuous-filter convolutional neural network for modeling quantum interactions

下一篇：A Meta-Learning Perspective on Cold-Start Recommendations for Items

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Regularizing RNNs...

Recently, caption generation with an encoder-de...
Supervised Descen...

Many computer vision problems (e.
Attributed Graph ...

Graph clustering is a fundamental task which di...
Hierarchical Task...

We extend hierarchical task network planning wi...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com