Abstract
Sparse regression such as the Lasso has achieved
great success in handling high-dimensional data.
However, one of the biggest practical problems
is that high-dimensional data often contain large
amounts of missing values. Convex Conditioned
Lasso (CoCoLasso) has been proposed for dealing with high-dimensional data with missing values, but it performs poorly when there are many
missing values, so that the high missing rate problem has not been resolved. In this paper, we
propose a novel Lasso-type regression method for
high-dimensional data with high missing rates. We
effectively incorporate mean imputed covariance,
overcoming its inherent estimation bias. The result
is an optimally weighted modification of CoCoLasso according to missing ratios. We theoretically
and experimentally show that our proposed method
is highly effective even when there are many missing values