Explanations can be manipulated and geometry is to blame

资源分类

2020-02-19 |

65 |

44 |

Abstract

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network’s output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations. Original Image Manipulated ImageFigure 1: Original image with corresponding explanation map on the left. Manipulated image withits explanation on the right. The chosen target explanation was an image with a text stating "thisexplanation was manipulated".33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.

上一篇：Certi?ed Adversarial Robustness with Additive Noise

下一篇：On the Value of Target Data in Transfer Learning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Joint Pose and Ex...

Facial expression recognition (FER) is a challe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com