Abstract
We present a novel dataset and a novel algorithm for rec-ognizing activities of daily living (ADL) from a first-personwearable camera. Handled objects are crucially important for egocentric ADL recognition. For specific examination of objects related to users’ actions separately from otherobjects in an environment, many previous works have addressed the detection of handled objects in images capturedfrom head-mounted and chest-mounted cameras. Neverthe-less, detecting handled objects is not always easy becausethey tend to appear small in images. They can be occludedby a user’s body. As described herein, we mount a cameraon a user’s wrist. A wrist-mounted camera can capture handled objects at a large scale, and thus it enables us to skip the object detection process. To compare a wrist-mountedcamera and a head-mounted camera, we also developed a novel and publicly available dataset 1 that includes videos and annotations of daily activities captured simultaneously by both cameras. Additionally, we propose a discriminative video representation that retains spatial and temporal information after encoding the frame descriptors extracted by convolutional neural networks (CNN).