Abstract
The data stream model is a fundamental model for processing massive data sets with limited memory and fast processing time. Recently Hsu et al. (2019) incorporated machine learning techniques into the data stream model in order to learn relevant patterns in the input data. Such techniques led to the training of an oracle to predict item frequencies in the streaming model. In this paper we explore the full power of such an oracle, showing that it can be applied to a wide array of problems in data streams, sometimes resulting in the first optimal bounds for such problems, and sometimes bypassing known lower bounds without such an oracle. We apply the oracle to counting distinct elements on the difference of streams, estimating frequency moments, estimating cascaded aggregates, and estimating moments of geometric data streams. For the distinct elements problem, we bypass the known lower bound and obtain a new space-optimal algorithm. For estimating the p-th frequency moment for 0 < p < 2 we obtain the first algorithms with optimal space and update time. For estimating the p-th frequency moment for p > 2 we obtain a quadratic savings in memory, bypassing known lower bounds without an oracle. We empirically validate our results, demonstrating also our improvements in practice.