Abstract We consider the problem of two-frame depth from defocus in conditions unsuitable for existing methods yet typical of everyday photography: a non-stationary scene, a handheld cellphone camera, a small aperture, and sparse scene texture. The key idea of our approach is to combine local estimation of depth and flflow in very small patches with a global analysis of image content—3D surfaces, deformations, fifigure-ground relations, textures. To enable local estimation we (1) derive novel defocus-equalization fifilters that induce brightness constancy across frames and (2) impose a tight upper bound on defocus blur—just three pixels in radius—by appropriately refocusing the camera for the second input frame. For global analysis we use a novel splinebased scene representation that can propagate depth and flflow across large irregularly-shaped regions. Our experiments show that this combination preserves sharp boundaries and yields good depth and flflow maps in the face of signifificant noise, non-rigidity, and data sparsity.