Abstract
Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and
computer vision. Expectation maximization (EM) is the
most popular algorithm for estimating the GMM parameters.
However, EM guarantees only convergence to a stationary
point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the
relationship between the negative log-likelihood function
and the Kullback-Leibler (KL) divergence, we propose an
alternative formulation for estimating the GMM parameters
using the sliced Wasserstein distance, which gives rise to
a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and
the data distribution with respect to the GMM parameters.
In contrast to the KL-divergence, the energy landscape for
the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent
scheme to obtain the optimal GMM parameters. We show
that our formulation results in parameter estimates that are
more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more
faithfully than the EM algorithm