Abstract
Feature selection is an indispensable preprocessing procedure for high-dimensional data analysis, but previous feature selection methods usually ignore sample diversity (i.e., every sample has individual contribution for the model construction) and have limited ability to deal with incomplete data sets where a part of training samples have unobserved data. To address these issues, in this paper, we firstly propose a robust feature selection framework to relieve the influence of outliers, and then introduce an indicator matrix to avoid unobserved data to take participation in numerical computation of feature selection so that both our proposed feature selection framework and exiting feature selection frameworks are available to conduct feature selection on incomplete data sets. We further propose a new optimization algorithm to optimize the resulting objective function as well as prove our algorithm to converge fast. Experimental results on both real and artificial incomplete data sets demonstrated that our proposed method outperformed the feature selection methods under comparison in terms of clustering performance.