ShapeStacks: Learning Vision-Based Physical
Intuition for Generalised Object Stacking
Abstract. Physical intuition is pivotal for intelligent agents to perform
complex tasks. In this paper we investigate the passive acquisition of
an intuitive understanding of physical principles as well as the active
utilisation of this intuition in the context of generalised object stacking.
To this end, we provide ShapeStacks1
: a simulation-based dataset featuring 20,000 stack configurations composed of a variety of elementary
geometric primitives richly annotated regarding semantics and structural
stability. We train visual classifiers for binary stability prediction on the
ShapeStacks data and scrutinise their learned physical intuition. Due to
the richness of the training data our approach also generalises favourably
to real-world scenarios achieving state-of-the-art stability prediction on a
publicly available benchmark of block towers. We then leverage the physical intuition learned by our model to actively construct stable stacks and
observe the emergence of an intuitive notion of stackability - an inherent
object affordance - induced by the active stacking task. Our approach
performs well exceeding the stack height observed during training and
even manages to counterbalance initially unstable structures.