Abstract
We consider the problem of online forecasting of sequences of length n with total-variation at most Cn using observations contaminated by independent ?subgaussian noise. We design an O(n log n)-time algorithm that achieves a cu2/3 mulative square error of with high probability. We also prove a lower bound that matches the upper bound in all parameters (up to a log(n) factor). To the best of our knowledge, this is the first polynomial-time algorithm that achieves the optimal rate in forecasting total variation bounded sequences and the first algorithm that adapts to unknown Cn . Our proof techniques leverage the special localized structure of Haar wavelet basis and the adaptivity to unknown smoothness parameters in the classical wavelet smoothing [Donoho et al., 1998]. We also compare our model to the rich literature of dynamic regret minimization and nonstationary stochastic optimization, where our problem can be treated as a special case. We show that the workhorse in those settings — online gradient descent and its variants with a fixed restarting schedule — are instances of a class of linear forecasters that require a suboptimal regret of This implies that the use of more adaptive algorithms is necessary to obtain the optimal rate.