Abstract
Some of the most successful machine learning al-gorithms, such as Support Vector Machines, are based on learning linear and kernel predictors with respect to a convex loss function, such as the hinge loss. For classification purposes, a more natural loss function is the 0-1 loss. However, using it leads to a non-convex problem for which there is no known efficient algorithm. In this paper, we describe and analyze a new algorithm for learning linear or kernel predictors with respect to the 0-1loss function. The algorithm is parameterized by L,which quantifies the effective width around the de-cision boundary in which the predictor may be un-certain. We show that without any distributional as-sumptions, and for any fixed L, the algorithm runs in polynomial time, and learns a classifier which is worse than the optimal such classifier by at most. We also prove a hardness result, showing that under a certain cryptographic assumption, no algo-rithm can learn such classifiers in time polynomial in L