Abstract With the abundance of video data, the interest in more effective methods for recognizing faces from unconstrained videos has grown. State-of-the-art algorithms for describing an image set use descriptors that are either very high-dimensional and/or sensitive to outliers and image misalignment. In this paper, we represent image sets as dictionaries of Symmetric Positive Defifinite (SPD) matrices that are more robust to local deformations and outliers. We then learn a tangent map for transforming the SPD matrix logarithms into a lower-dimensional Log-Euclidean space such that the transformed gallery atoms adhere to a more discriminative subspace structure. A query image set is then classifified by fifirst mapping its SPD descriptors into the computed Log-Euclidean tangent space and using the sparse representation over the tangent space to decide a label for the image set. Experiments on three public video datasets show that the proposed method outperforms many stateof-the-art methods.