Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

资源分类

2020-03-16 |

41 |

38 |

Abstract

Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its train Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to provably solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We not that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it controls generalization of RNN for the classification task. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on the MNIST and Penn Tree Bank data sets. 1 University of Texas at Austin 2 Amazon.com. Correspto: Jiong Zhang

. Proceedings of the 35 th International Conference on MachLearning, Stockholm, Sweden, PMLR 80, 2018. Copyright 201by the author(s).

上一篇：Differentially Private Database Release via Kernel Mean Embeddings

下一篇：Thompson Sampling for Combinatorial Semi-Bandits

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com