Safety-Aware Algorithms for Adversarial Contextual Bandit

资源分类

2020-03-09 |

60 |

54 |

Abstract

In this work we study the safe sequential decision making problem under the setting of adversarial contextual bandits with sequential risk constraints. At each round, nature prepares a context a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and receives the corresponding cost and risk associated with the pulled arm. In addition to minimizing the cumulative cost, for safety purposes, the learner needs to make safe decisions such that the average of the cumulative risk from all pulled arms should not be larger than a pre-defined threshold. To address this problem, we first study online convex programming in the full information setting where in each round the learner receives an adversarial convex loss and a convex constraint. We develop a meta algorithm leveraging online mirror descent for the full information setting and then extend it to contextual bandit with sequential risk constraints setting us ing expert advice. Our algorithms can achieve near-optimal regret in terms of minimizing the total cost, while successfully maintaining a sublinear growth of accumulative risk constraint violation. We support our theoretical results by demonstrating our algorithm on a simple simulated robotics reactive control task.

上一篇：An Adaptive Test of Independence with Analytic Kernel Embeddings

下一篇：A Unified View of Multi-Label Performance Measures

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Rating-Boosted La...

The performance of a recommendation system reli...
Hierarchical Task...

We extend hierarchical task network planning wi...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com