资源论文DISTRIBUTIONALLY ROBUST NEURAL NETWORKSFOR GROUP SHIFTS :O NTHE IMPORTANCE OF REG -ULARIZATION FOR WORST-C ASE GENERALIZATION

DISTRIBUTIONALLY ROBUST NEURAL NETWORKSFOR GROUP SHIFTS :O NTHE IMPORTANCE OF REG -ULARIZATION FOR WORST-C ASE GENERALIZATION

2019-12-30 | |  93 |   50 |   0

Abstract

Overparameterized neural networks trained to minimize average loss can be highly accurate on average on an i.i.d. test set, yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) provides an approach for learning models that instead minimize worst-case training loss over a set of pre-defined groups. We find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss will also already have vanishing worst-case training loss. Instead, their poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization—stronger-than-typical `2 regularization or early stopping—we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements over standard models on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is critical for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce and provide convergence guarantees for a stochastic optimizer for this group DRO setting, underpinning the empirical study above.

上一篇:SPAN RECOVERY FOR DEEP NEURAL NETWORKS WITH APPLI -CATIONS TO INPUT OBFUSCATION

下一篇:LINEAR SYMMETRIC QUANTIZATION OF NEURALN ETWORKS FOR LOW- PRECISION INTEGER HARD -WARE

用户评价
全部评价

热门资源

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Rating-Boosted La...

    The performance of a recommendation system reli...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...