Abstract
Human action recognition from videos draws tremendous in- terest in the past many years. In this work, we first find that the trifocal tensor resides in a twelve dimensional subspace of the original space if the first two views are already matched and the fundamental matrix be- tween them is known, which we refer to as subtensor. Then we use the subtensor to perform the task of action recognition under three views. We find that treating the two template views separately or not consid- ering the correspondence relation already known between the first two views omits a lot of useful information. Experiments and datasets are designed to demonstrate the effectiveness and improved performance of the proposed approach.