Abstract. Recent work has shown that deep neural networks are highly
sensitive to tiny perturbations of input images, giving rise to adversarial examples. Though this property is usually considered a weakness of
learned models, we explore whether it can be beneficial. We find that
neural networks can learn to use invisible perturbations to encode a rich
amount of useful information. In fact, one can exploit this capability for
the task of data hiding. We jointly train encoder and decoder networks,
where given an input message and cover image, the encoder produces
a visually indistinguishable encoded image, from which the decoder can
recover the original message. We show that these encodings are competitive with existing data hiding algorithms, and further that they can be
made robust to noise: our models learn to reconstruct hidden information
in an encoded image despite the presence of Gaussian blurring, pixelwise dropout, cropping, and JPEG compression. Even though JPEG is
non-differentiable, we show that a robust model can be trained using
differentiable approximations. Finally, we demonstrate that adversarial
training improves the visual quality of encoded images