Abstract
Principal Component Analysis (PCA) is a fundamental
method for estimating a linear subspace approximation to
high-dimensional data. Many algorithms exist in literature
to achieve a statistically robust version of PCA called RPCA.
In this paper, we present a geometric framework for computing the principal linear subspaces in both situations that
amounts to computing the intrinsic average on the space of
all subspaces (the Grassmann manifold). Points on this manifold are defined as the subspaces spanned by K-tuples of
observations. We show that the intrinsic Grassmann average
of these subspaces coincide with the principal components
of the observations when they are drawn from a Gaussian
distribution. Similar results are also shown to hold for the
RPCA. Further, we propose an efficient online algorithm
to do subspace averaging which is of linear complexity in
terms of number of samples and has a linear convergence
rate. When the data has outliers, our proposed online robust subspace averaging algorithm shows significant performance (accuracy and computation time) gain over a recently
published RPCA methods with publicly accessible code. We
have demonstrated competitive performance of our proposed
online subspace algorithm method on one synthetic and two
real data sets. Experimental results depicting stability of
our proposed method are also presented. Furthermore, on
two real outlier corrupted datasets, we present comparison
experiments showing lower reconstruction error using our
online RPCA algorithm. In terms of reconstruction error
and time required, both our algorithms outperform the competition.