资源论文Policy Shaping with Human Teachers

Policy Shaping with Human Teachers

2019-11-20 | |  66 |   33 |   0
Abstract In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for humangenerated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy. Perhaps surprisingly, we show that the data generated by our 26 participants yields even better performance for the agent than data generated by the oracle. This might be because humans do not discourage exploring multiple winning policies. Additionally, we evaluate the impact of different verbal instructions, and different interpretations of silence, finding that the usefulness of data is affected both by what instructions is given to teachers, and how the data is interpreted.

上一篇:Compressed Spectral Regression for Efficient Nonlinear Dimensionality Reduction

下一篇:A Space Alignment Method for Cold-Start TV Show Recommendations

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...