Abstract
Clustering is a fundamental research topic in data
mining and machine learning. In addition, many
specific applications demand that the clusters obtained be balanced. In this paper, we present a
balanced clustering model that is to minimize the
sum of squared distances to cluster centers, with
uniform regularization functions to control the balance degree of the clustering results. To solve the
model, we adopt the idea of the k-means method.
We show that the k-means assignment step has an
equivalent minimum cost flow formulation when
the regularization functions are all convex. By
using a novel and simple acceleration technique
for the k-means and network simplex methods our
model can be solved quite efficiently. Experimental results over benchmarks validate the advantage
of our algorithm compared to the state-of-the-art
balanced clustering algorithms. On most datasets,
our algorithm runs more than 100 times faster than
previous algorithms with a better solution