Feature Prioritization and Regularization Improve Standard Accuracy and
Adversarial Robustness
Abstract
Adversarial training has been successfully applied
to build robust models at a certain cost. While
the robustness of a model increases, the standard
classification accuracy declines. This phenomenon
is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by
a nonlinear attention module and L2 feature regularization to improve the adversarial robustness and
the standard accuracy relative to adversarial training. The attention module encourages the model to
rely heavily on robust features by assigning larger
weights to them while suppressing non-robust features. The regularizer encourages the model to extract similar features for the natural and adversarial images, effectively ignoring the added perturbation. In addition to evaluating the robustness of
our model, we provide justification for the attention
module and propose a novel experimental strategy
that quantitatively demonstrates that our model is
almost ideally aligned with salient data characteristics. Additional experimental results illustrate the
power of our model relative to the state of the art
methods