Abstract. Outfits in online fashion data are composed of items of many
different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires
a method that can learn both notions of similarity (for example, when
two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit). This paper presents an
approach to learning an image embedding that respects item type, and
jointly learns notions of item similarity and compatibility in an end-toend model. To evaluate the learned representation, we crawled 68,306
outfits created by users on the Polyvore website. Our approach obtains
3-5% improvement over the state-of-the-art on outfit compatibility prediction and fill-in-the-blank tasks using our dataset, as well as an established smaller dataset, while supporting a variety of useful queries