Abstract
This paper considers the problem of clustering a partially observed unweighted graph – i.e. one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of ”disagreements” i.e. the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We show that our algorithm succeeds under certain natural assumptions on the optimal clustering and its disagreements. Our results significantly strengthen existing matrix splitting results for the special case of our clustering problem. Our results directly enhance solutions to the problem of Correlation Clustering (Bansal et al., 2002) with partial observations. This work is supported in part by NSF CAREER grant 095405NSF grant EFRI-0735905, DTRA grant HDTRA1-08-0029 andNUS startup grant R-265-000-384-133. Appearing in Proceedings of the 28 th International Confeon Machine Learning, Bellevue, WA, USA, 2011. Copyright 2by the author(s)/owner(s).