Abstract
Carefully crafted, often imperceptible, adversarial
perturbations have been shown to cause state-ofthe-art models to yield extremely inaccurate outputs, rendering them unsuitable for safety-critical
application domains. In addition, recent work has
shown that constraining the attack space to a low
frequency regime is particularly effective. Yet, it
remains unclear whether this is due to generally
constraining the attack search space or specifically
removing high frequency components from consideration. By systematically controlling the frequency components of the perturbation, evaluating
against the top-placing defense submissions in the
NeurIPS 2017 competition, we empirically show
that performance improvements in both the whitebox and black-box transfer settings are yielded only
when low frequency components are preserved.
In fact, the defended models based on adversarial training are roughly as vulnerable to low frequency perturbations as undefended models, suggesting that the purported robustness of state-ofthe-art ImageNet defenses is reliant upon adversarial perturbations being high frequency in nature. We do find that under L-inf-norm constraint
16/255, the competition distortion bound, low frequency perturbations are indeed perceptible. This
questions the use of the L-inf-norm, in particular,
as a distortion metric, and, in turn, suggests that explicitly considering the frequency space is promising for learning robust models which better align
with human perception