Abstract
What constitutes a scene? Defining a meaningful vocabulary for scene discovery is a challenging problem that has important consequences for object recognition. We consider scenes to depict correlated objects and present visual similarity. We introduce a max-margin factorization model that finds a low di- mensional subspace with high discriminative power for correlated annotations. We postulate this space should allow us to discover a large number of scenes in unsupervised data; we show scene discrimination results on par with supervised approaches. This model also produces state of the art word prediction results in- cluding good annotation completion.