Abstract. Predicting future activities from an egocentric viewpoint is
of particular interest in assisted living. However, state-of-the-art egocentric activity understanding techniques are mostly NOT capable of
predictive tasks, as their synchronous processing architecture performs
poorly in either modeling event dependency or pruning temporal redundant features. This work explicitly addresses these issues by proposing
an asynchronous gaze-event driven attentive activity prediction network.
This network is built on a gaze-event extraction module inspired by the
fact that gaze moving in/out of a certain object most probably indicates the occurrence/ending of a certain activity. The extracted gaze
events are input to: 1) an asynchronous module which reasons about
the temporal dependency between events and 2) a synchronous module
which softly attends to informative temporal durations for more compact
and discriminative feature extraction. Both modules are seamlessly integrated for collaborative prediction. Extensive experimental results on
egocentric activity prediction as well as recognition well demonstrate the
effectiveness of the proposed method