Abstract. Objects are made of parts, each with distinct geometry,
physics, functionality, and affordances. Developing such a distributed,
physical, interpretable representation of objects will facilitate intelligent
agents to better explore and interact with the world. In this paper, we
study physical primitive decomposition—understanding an object through
its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel
formulation that learns physical primitives by explaining both an object’s
appearance and its behaviors in physical events. Our model performs well
on block towers and tools in both synthetic and real scenarios; we also
demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to
better understand our model and contrast it with human performance.