Abstract
Reasoning about ob jects and their affordances is a funda- mental problem for visual intelligence. Most of the previous work casts this problem as a classification task where separate classifiers are trained to label ob jects, recognize attributes, or assign affordances. In this work, we consider the problem of ob ject affordance reasoning using a knowledge base representation. Diverse information of ob jects are first harvested from images and other meta-data sources. We then learn a knowledge base (KB) using a Markov Logic Network (MLN). Given the learned KB, we show that a diverse set of visual inference tasks can be done in this unified framework without training separate classifiers, including zero- shot affordance prediction and ob ject recognition given human poses.