Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games

资源分类

2019-11-22 |

136 |

126 |

Abstract Monte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for rewarddesign) for learning a reward-bonus function to improve UCT (a MCTS algorithm). Unlike previous applications of PGRD in which the space of reward-bonus functions was limited to linear functions of hand-coded state-action-features, we use PGRD with a multi-layer convolutional neural network to automatically learn features from raw perception as well as to adapt the non-linear reward-bonus function parameters. We also adopt a variance-reducing gradient method to improve PGRD’s performance. The new method improves UCT’s performance on multiple ATARI games compared to UCT without the reward bonus. Combining PGRD and Deep Learning in this way should make adapting rewards for MCTS algorithms far more widely and practically applicable than before.

上一篇：A Distributed and Scalable Machine Learning Approach for Big Data

下一篇：Incorporating External Knowledge into Crowd Intelligence for More Specific Knowledge Acquisition

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Regularizing RNNs...

Recently, caption generation with an encoder-de...
Learning Expressi...

Facial expression is temporally dynamic event w...
Attributed Graph ...

Graph clustering is a fundamental task which di...
Compact MDDs for ...

Pseudo-Boolean (PB) constraints are usually en...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com