Abstract
Let p be an unknown and arbitrary probability distribution over [0, 1). We consider the problem of density estimation, in which a learning algorithm is given i.i.d. draws from p and must (with high probability) output a hypothesis distribution that is close to p. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any k and ", we give an algorithm that makes draws from p, runs in time, and outputs a hypothesis distribution h that is piecewise constant with pieces. With high probability the hypothesis h satisfies , where denotes the total variation distance (statistical distance), C is a universal constant, and is the smallest total variation distance between p and any k-piecewise constant distribution. The sample size and running time of our algorithm are optimal up to logarithmic factors. The “approximation factor” C in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of k and " can achieve C < 2 regardless of what kind of hypothesis distribution it uses.