Abstract
This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand
for 3D movies or AR/VR. We start with a careful examination of applying existing monocular style transfer methods
to left and right views of stereoscopic images separately.
This reveals that the original disparity consistency cannot be well preserved in the final stylization results, which
causes 3D fatigue to the viewers. To address this issue, we
incorporate a new disparity loss into the widely adopted
style loss function by enforcing the bidirectional disparity
constraint in non-occluded regions. For a practical realtime solution, we propose the first feed-forward network by
jointly training a stylization sub-network and a disparity
sub-network, and integrate them in a feature level middle
domain. Our disparity sub-network is also the first end-toend network for simultaneous bidirectional disparity and
occlusion mask estimation. Finally, our network is effectively extended to stereoscopic videos, by considering both
temporal coherence and disparity consistency. We will show
that the proposed method clearly outperforms the baseline
algorithms both quantitatively and qualitatively.