Abstract
We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted. More specifically, for a data matrix X and an underlying model , the response vector is generated as is the corruption vector supported over at most C · n coordinates. Existing exact recovery results for RLSR focus solely on penalty based convex formulations and impose relatively strict model assumptions such as requiring the corruptions b to be selected independently of X. In this work, we study a simple hard-thresholding algorithm called T ORRENT which, under mild conditions on X, can recover exactly even if b corrupts the response variables in an adversarial manner, i.e. both the support and entries of b are selected adversarially after observing X and . Our results hold under deterministic assumptions which are satisfied if X is sampled from any sub-Gaussian distribution. Finally unlike existing results that apply only to a fixed , generated independently of X, our results are universal and hold for any . Next, we propose gradient descent-based extensions of T ORRENT that can scale efficiently to large scale problems, such as high dimensional sparse recovery. and prove similar recovery guarantees for these extensions. Empirically we find T OR RENT , and more so its extensions, offering significantly faster recovery than the state-of-the-art solvers. For instance, even on moderate-sized datasets (with p = 50K) with around 40% corrupted responses, a variant of our proposed method called T ORRENT-HYB is more than 20× faster than the best solver.