Seeing What Is Not There:
Learning Context to Determine Where Objects Are Missing
Abstract
Most of computer vision focuses on what is in an image. We propose to train a standalone object-centric context
representation to perform the opposite task: seeing what is
not there. Given an image, our context model can predict
where objects should exist, even when no object instances
are present. Combined with object detection results, we can
perform a novel vision task: finding where objects are missing in an image. Our model is based on a convolutional
neural network structure. With a specially designed training strategy, the model learns to ignore objects and focus
on context only. It is fully convolutional thus highly effi-
cient. Experiments show the effectiveness of the proposed
approach in one important accessibility task: finding city
street regions where curb ramps are missing, which could
help millions of people with mobility disabilities