Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration (Extended Abstract) ?

资源分类

2019-10-10 |

72 |

45 |

Abstract Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual banditbased orchestrator then picks between the two policies: constraint-based and environment rewardbased. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using PacMan and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways

上一篇：Split Q Learning: Reinforcement Learning with Two-Stream Rewards

下一篇：Transfer of Temporal Logic Formulas in Reinforcement Learning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com