Abstract. Autofocus (AF) on smartphones is the process of determining how to move a camera’s lens such that certain scene content is in
focus. The underlying algorithms used by AF systems, such as contrast
detection and phase differencing, are well established. However, determining a high-level objective regarding how to best focus a particular
scene is less clear. This is evident in part by the fact that different smartphone cameras employ different AF criteria; for example, some attempt
to keep items in the center in focus, others give priority to faces while
others maximize the sharpness of the entire scene. The fact that different objectives exist raises the research question of whether there is a
preferred objective. This becomes more interesting when AF is applied to
videos of dynamic scenes. The work in this paper aims to revisit AF for
smartphones within the context of temporal image data. As part of this
effort, we describe the capture of a new 4D dataset that provides access
to a full focal stack at each time point in a temporal sequence. Based on
this dataset, we have developed a platform and associated application
programming interface (API) that mimic real AF systems, restricting
lens motion within the constraints of a dynamic environment and frame
capture. Using our platform we evaluated several high-level focusing objectives and found interesting insight into what users prefer. We believe
our new temporal focal stack dataset, AF platform, and initial user-study
findings will be useful in advancing AF research.