Abstract
We consider the task of estimating the entropy of k-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires samples and a constant O(1) memory words of space and outputs a ± estimate of H(p). Without space limitations, the sample complexity has been established as which k is sub-linear in the domain size k, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in k. Our algorithm partitions [0, 1] into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates.