CAN GRADIENT CLIPPING MITIGATE LABEL NOISE ?

资源分类

2020-01-02 |

48 |

41 |

Abstract

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, clipping controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum. This intuition has been made precise in a line of recent works, which show that suitable clipping can yield significantly faster convergence than vanilla gradient descent. In this paper, we propose a new lens for studying gradient clipping, namely, robustness: informally, one expects clipping to mitigate the effects of noise, since one does not overly trust any single sample. Surprisingly, we prove that for the common problem of label noise in classification, standard gradient clipping does not in general provide robustness. On the other hand, we show that a simple variant of gradient clipping is robust, and is equivalent to suitably modifying the underlying loss function. As a special case, this yields a simple, noise-robust modification of the standard cross-entropy loss which performs well empirically.

上一篇：GEOMETRIC ANALYSIS OF NONCONVEX OPTIMIZA -TION LANDSCAPES FOR OVERCOMPLETE LEARNING

下一篇：ON THE RELATIONSHIP BETWEEN SELF -ATTENTIONAND CONVOLUTIONAL LAYERS

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com