Abstract
We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain
large-scale 2D/3D image data with the perfect per-pixel
ground truth. An attributed spatial And-Or graph (S-AOG)
is proposed to represent indoor scenes. The S-AOG is a
probabilistic grammar model, in which the terminal nodes
are object entities including room, furniture, and supported
objects. Human contexts as contextual relations are encoded by Markov Random Fields (MRF) on the terminal
nodes. We learn the distributions from an indoor scene
dataset and sample new layouts using Monte Carlo Markov
Chain. Experiments demonstrate that the proposed method
can robustly sample a large variety of realistic room layouts based on three criteria: (i) visual realism comparing to
a state-of-the-art room arrangement method, (ii) accuracy
of the affordance maps with respect to ground-truth, and
(ii) the functionality and naturalness of synthesized rooms
evaluated by human subjects