Abstract
We aim to understand the dynamics of social interactionsbetween two people by recognizing their actions and reac-tions using a head-mounted camera. Our work will impact several first-person vision tasks that need the detailed understanding of social interactions, such as automatic video summarization of group events and assistive systems. Torecognize micro-level actions and reactions, such as slight shifts in attention, subtle nodding, or small hand actions, where only subtle body motion is apparent, we propose touse paired egocentric videos recorded by two interactingpeople. We show that the first-person and second-person points-of-view features of two people, enabled by pairedegocentric videos, are complementary and essential for reliably recognizing micro-actions and reactions. We also build a new dataset of dyadic (two-persons) interactions that comprises more than 1000 pairs of egocentric videosto enable systematic evaluations on the task of micro-actionand reaction recognition.