Abstract
Bayesian treatments of Canonical Correlation Analysis (CCA) -type latent variable models have been recently proposed for coping with overfitting in small sample sizes, as well as for producing factorizations of the data sources into correlated and non-shared effects. However, all of the current implementations of Bayesian CCA and its extensions are computationally inefficient for highdimensional data and, as shown in this paper, break down completely for high-dimensional sources with low sample count. Furthermore, they cannot reliably separate the correlated effects from non-shared ones. We propose a new Bayesian CCA variant that is computationally efficient and works for highdimensional data, while also learning the factorization more accurately. The improvements are gained by introducing a group sparsity assumption and an improved variational approximation. The method is demonstrated to work well on multi-label prediction tasks and in analyzing brain correlates of naturalistic audio stimulation.