Abstract
We present a system that transforms a monocular video
of a soccer game into a moving 3D reconstruction, in which
the players and field can be rendered interactively with a
3D viewer or through an Augmented Reality device. At the
heart of our paper is an approach to estimate the depth
map of each player, using a CNN that is trained on 3D
player data extracted from soccer video games. We compare with state of the art body pose and depth estimation
techniques, and show results on both synthetic ground truth
benchmarks, and real YouTube soccer footage.