Abstract
Most of the approaches for discovering visual attributes in images demand signifificant supervision, which is cumbersome to obtain. In this paper, we aim to discover visual attributes in a weakly supervised setting that is commonly encountered with contemporary image search engines. For instance, given a noun (say forest) and its associated attributes (say dense, sunlit, autumn), search engines can now generate many valid images for any attribute-noun pair (dense forests, autumn forests, etc). However, images for an attributenoun pair do not contain any information about other attributes (like which forests in the autumn are dense too). Thus, a weakly supervised scenario occurs: each of the M attributes corresponds to a class such that a training image in class m ∈ {1, . . . , M} contains a single label that indicates the presence of the mth attribute only. The task is to discover all the attributes present in a test image