Abstract
In this work, we present a new framework for personrecognition in photo albums that exploits contextual cuesat multiple levels, spanning individual persons, individual photos, and photo groups. Through experiments, we show that the information available at each of these distinct con-textual levels provides complementary cues as to personidentities. At the person level, we leverage clothing andbody appearance in addition to facial appearance, and tocompensate for instances where the faces are not visible. Atthe photo level we leverage a learned prior on the joint distribution of identities on the same photo to guide the identityassignments. Going beyond a single photo, we are able to infer natural groupings of photos with shared context in anunsupervised manner. By exploiting this shared contextual information, we are able to reduce the identity search space and exploit higher intra-personal appearance consistency within photo groups. Our new framework enables efficient use of these complementary multi-level contextual cues to improve overall recognition rates on the photo album person recognition task, as demonstrated through state-of-theart results on a challenging public dataset. Our results outperform competing methods by a significant margin, while being computationally efficient and practical in a real world application.