资源论文right for the right reasons training differentiable models by constraining their explanations

right for the right reasons training differentiable models by constraining their explanations

2019-11-01 | |  57 |   36 |   0
Abstract Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially if conditions in training may differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions. These tools can help us identify when models are right for the wrong reasons. However, these methods do not scale to explaining entire datasets and cannot correct the problems they reveal. We introduce a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients. We apply these penalties both based on expert annotation and in an unsupervised fashion that produces multiple classifiers with qualitatively different decision boundaries. On multiple datasets, we show our approach generates faithful explanations and models that generalize much better when conditions differ between training and test.

上一篇:robust quadratic programming for price optimization

下一篇:maximizing awareness about hiv in social networks of homeless youth with limited information

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...