Abstract
We propose a method to generate audio adversarial
examples that can attack a state-of-the-art speech
recognition model in the physical world. Previous
work assumes that generated adversarial examples
are directly fed to the recognition model, and is
not able to perform such a physical attack because
of reverberation and noise from playback environments. In contrast, our method obtains robust adversarial examples by simulating transformations
caused by playback or recording in the physical
world and incorporating the transformations into
the generation process. Evaluation and a listening
experiment demonstrated that our adversarial examples are able to attack without being noticed by
humans. This result suggests that audio adversarial
examples generated by the proposed method may
become a real threat