The dataset contains over 15K images of 20 people (6 females and 14 males - 4 people were recorded twice). For each frame, a depth image, the corresponding rgb image (both 640x480 pixels), and the annotation is provided. The head pose range covers about +-75 degrees yaw and +-60 degrees pitch. Ground truth is provided in the form of the 3D location of the head and its rotation.
Even though our algorithms work on depth images alone, we provide the RGB images as well. Please note that this is a database acquired with frame-by-frame estimation in mind, not tracking. For this reason, some frames are missing.