Abstract. Shallow Depth-of-Field (DoF) is a desirable effect in photography which renders artistic photos. Usually, it requires single-lens
reflex cameras and certain photography skills to generate such effects.
Recently, dual-lens on cellphones is used to estimate scene depth and
simulate DoF effects for portrait shots. However, this technique cannot
be applied to photos already taken and does not work well for wholebody scenes where the subject is at a distance from the cameras. In this
work, we introduce an automatic system that achieves portrait DoF rendering for monocular cameras. Specifically, we first exploit Convolutional
Neural Networks to estimate the relative depth and portrait segmentation maps from a single input image. Since these initial estimates from
a single input are usually coarse and lack fine details, we further learn
pixel affinities to refine the coarse estimation maps. With the refined
estimation, we conduct depth and segmentation-aware blur rendering to
the input image with a Conditional Random Field and image matting.
In addition, we train a spatially-variant Recursive Neural Network to
learn and accelerate this rendering process. We show that the proposed
algorithm can effectively generate portraitures with realistic DoF effects
using one single input. Experimental results also demonstrate that our
depth and segmentation estimation modules perform favorably against
the state-of-the-art methods both quantitatively and qualitatively