Abstract
We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be “robustified” so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well as the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noiserobust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of “robustifying” clustering algorithms fail to achieve similar guarantees.