资源论文Multi-objective Bandits: Optimizing the Generalized Gini Index

Multi-objective Bandits: Optimizing the Generalized Gini Index

2020-03-09 | |  71 |   39 |   0

Abstract

We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized. The goal of the agent is to find a policy, which can optimize these objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini Index (GGI) aggregation function. We propose an online gradient descent algorithm which exploits the convexity of the GGI aggregation function, and controls the exploration in a careful way achieving a distribution-free regr 图片.png with high probability. We test our algorithm on synthetic data as well as on an electr battery control problem where the goal is to trade off the use of the different cells of a battery in order to balance their respective degradation rates

上一篇:The Predictron: End-To-End Learning and Planning

下一篇:Adversarial Feature Matching for Text Generation

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...