Abstract
To see is to sketch – free-hand sketching naturally builds
ties between human and machine vision. In this paper, we
present a novel approach for translating an object photo to
a sketch, mimicking the human sketching process. This is
an extremely challenging task because the photo and sketch
domains differ significantly. Furthermore, human sketches
exhibit various levels of sophistication and abstraction even
when depicting the same object instance in a reference
photo. This means that even if photo-sketch pairs are available, they only provide weak supervision signal to learn
a translation model. Compared with existing supervised
approaches that solve the problem of D(E(photo)) ?
sketch), where E(·) and D(·) denote encoder and decoder
respectively, we take advantage of the inverse problem (e.g.,
D(E(sketch) ? photo), and combine with the unsupervised learning tasks of within-domain reconstruction, all
within a multi-task learning framework. Compared with