Abstract
We present an extensive study of a key problem in online learning where the learner can opt to ab stain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we fir point out a bias problem that limits the straightf ward extension of algorithms such as UCB N to this context. Next, we give a new algorithm, UCB GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB N. We further report th results of a series of experiments demonstrating that UCB GT largely outperforms that extension of UCB N, as well as other standard baselines.