Abstract. In this paper, we propose a new Dividing and Aggregating
Network (DA-Net) for multi-view action recognition. In our DA-Net,
we learn view-independent representations shared by all views at lower
layers, while we learn one view-specific representation for each view at
higher layers. We then train view-specific action classifiers based on the
view-specific representation for each view and a view classifier based on
the shared representation at lower layers. The view classifier is used to
predict how likely each video belongs to each view. Finally, the predicted
view probabilities from multiple views are used as the weights when
fusing the prediction scores of view-specific action classifiers. We also
propose a new approach based on the conditional random field (CRF)
formulation to pass message among view-specific representations from
different branches to help each other. Comprehensive experiments on two
benchmark datasets clearly demonstrate the effectiveness of our proposed
DA-Net for multi-view action recognition