Teaching AI Agents Ethical Values
Using Reinforcement Learning and Policy Orchestration (Extended Abstract) ?
Abstract
Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they
behave in ways aligned with the values of society, we must develop techniques that allow these
agents to not only maximize their reward in an environment, but also to learn and follow the implicit
constraints of society. We detail a novel approach
that uses inverse reinforcement learning to learn
a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual banditbased orchestrator then picks between the two
policies: constraint-based and environment rewardbased. The contextual bandit orchestrator allows
the agent to mix policies in novel ways, taking the
best actions from either a reward-maximizing or
constrained policy. In addition, the orchestrator is
transparent on which policy is being employed at
each time step. We test our algorithms using PacMan and show that the agent is able to learn to act
optimally, act within the demonstrated constraints,
and mix these two functions in complex ways