Abstract
We consider a budgeted learning setting, where the learner can only choose and observe a small subset of the attributes of each training example. We develop efficient algorithms for Ridge and Lasso linear regression, which utilize the geometry of the data by a novel distribution-dependent sampling scheme, and have excess p risk bounds which are better a factor of up to over the state-of-the-art, where d is the dimension and k + 1 is the number of observed attributes per example. Moreover, under reasonable assumptions, our algorithms are the first in our setting which can provably use less attributes than fullinformation algorithms, which is the main concern in budgeted learning. We complement our theoretical analysis with experiments which support our claims.