Bandits with Unobserved Confounders: A Causal Approach

资源分类

2020-02-04 |

81 |

50 |

Abstract

The Multi-Armed Bandit problem constitutes an archetypal setting for sequential decision-making, permeating multiple domains including engineering, business, and medicine. One of the hallmarks of a bandit setting is the agent’s capacity to explore its environment through active intervention, which contrasts with the ability to collect passive data by estimating associational relationships between actions and payouts. The existence of unobserved confounders, namely unmeasured variables affecting both the action and the outcome variables, implies that these two data-collection modes will in general not coincide. In this paper, we show that formalizing this distinction has conceptual and algorithmic implications to the bandit setting. The current generation of bandit algorithms implicitly try to maximize rewards based on estimation of the experimental distribution, which we show is not always the best strategy to pursue. Indeed, to achieve low regret in certain realistic classes of bandit problems (namely, in the face of unobserved confounders), both experimental and observational quantities are required by the rational agent. After this realization, we propose an optimization metric (employing both experimental and observational distributions) that bandit agents should pursue, and illustrate its benefits over traditional algorithms.

上一篇：Streaming Min-Max Hypergraph Partitioning

下一篇：Tractable Bayesian Network Structure Learning with Bounded Vertex Cover Number

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Rating-Boosted La...

The performance of a recommendation system reli...
Hierarchical Task...

We extend hierarchical task network planning wi...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com