资源论文Semiparametric Contextual Bandits

Semiparametric Contextual Bandits

2020-03-11 | |  90 |   53 |   0

Abstract

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic ban dit problem where the reward for an action is modeled as a linear function of known action features confounded by a non-linear action-independent term.p We design new algorithms that achieve 图片.png regret over T rounds, when the linear function is d-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenewald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

上一篇:MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

下一篇:Analyzing Uncertainty in Neural Machine Translation

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...