Abstract
Adversarial patch attacks are one of the most practical threat models against realworld computer vision systems. This paper studies the certified and empirical performance of defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial noise, are easily broken by simple white-box adversaries. Motivated by this finding, we present an approach for certified defense against patch attacks, and propose methods for fast training of these models. Finally, we experiment with different patch shapes for testing, and observe that robustness transfers across shapes surprisingly well.