Abstract
In the semantic multinomial framework patches and im-ages are modeled as points in a semantic probability sim-plex.Patch theme models are learned resorting to weak supervision via image labels,which leads the problem of scene categories co-occurring in this semantic space.For-tunately,each category has its own co-occurrence patterns that are consistent across the images in that category.Thus,discovering and modeling these patterns is critical to im-prove the recognition performance in this representation.In this paper, we observe that not only global co-occurrences at the image-level are important,but also diferent regions have diferent category co-occurrence patterns.We exploit local contextual relations to address the problem of discov-ering consistent co-occurrence patterns and removing noisy ones.Our hypothesis is that a less noisy semantic represen-tation,would greatly help the classifier to model consistent co-occurrences and discriminate better between scene cat-egories.An important advantage of modeling features in a semantic space is that this space is feature independent.Thus,we can combine multiple features and spatial neigh-bors in the same common space,and formulate the problem as minimizing a context-dependent energy.Experimental results show that exploiting different types of contextual re-lations consistently improves the recognition accuracy.In particular,larger datasets benefit more from the proposed method,leading to very competitive performance.