Abstract. We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks. The compatibility of a foreground object and a background scene depends on various aspects such as semantics, surrounding
context, geometry, style and color. However, existing image search techniques measure the similarities on only a few aspects, and may return
many results that are not suitable for compositing. Moreover, the importance of each factor may vary for different object categories and image
content, making it difficult to manually define the matching criteria. In
this paper, we propose to learn feature representations for foreground
objects and background scenes respectively, where image content and
object category information are jointly encoded during training. As a
result, the learned features can adaptively encode the most important
compatibility factors. We project the features to a common embedding
space, so that the compatibility scores can be easily measured using the
cosine similarity, enabling very efficient search. We collect an evaluation
set consisting of eight object categories commonly used in compositing
tasks, on which we demonstrate that our approach significantly outperforms other search techniques