Abstract
Clustering is the task of grouping a set of objects so thatobjects in the same cluster are more similar to each otherthan to those in other clusters. The crucial step in most clus-tering algorithms is to find an appropriate similarity met-ric, which is both challenging and problem-dependent. Su-pervised clustering approaches, which can exploit labeled clustered training data that share a common metric with the test set, have thus been proposed. Unfortunately, current metric learning approaches for supervised clustering do notscale to large or even medium-sized datasets. In this paper, we propose a new structured Mahalanobis Distance Metric Learning method for supervised clustering. We formulate our problem as an instance of large margin structured prediction and prove that it can be solved very efficiently in closed-form. The complexity of our method is (in most cases) linear in the size of the training dataset. We further reveal a striking similarity between our approach and multivariate linear regression. Experiments on both synthetic and real datasets confirm several orders of magnitude speedup while still achieving state-of-the-art performance.