DQN-tensorflow

资源分类

DQN-tensorflow

2019-08-19 |

248 |

0 |

DQN-tensorflow

Human-Level Control through Deep Reinforcement Learning

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning.

This implementation contains:

1、Deep Q-network and Q-learning

2、Experience replay memory

2-1、to reduce the correlations between consecutive updates

3、Network for Q-learning targets are fixed for intervals

3-1、to reduce the correlations between target and predicted Q-values

Requirements

1、Python 2.7 or Python 3.3+

2、gym

3、tqdm

4、SciPy or OpenCV2

5、TensorFlow 0.12.0

Usage

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for Breakout:

$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True

To test and record the screen with gym:

$ python main.py --is_train=False
$ python main.py --is_train=False --display=True

Results

Result of training for 24 hours using GTX 980 ti.

Simple Results

Details of Breakout with model m2(red) for 30 hours using GTX 980 Ti.

Details of Breakout with model m3(red) for 30 hours using GTX 980 Ti.

Detailed Results

[1] Action-repeat (frame-skip) of 1, 2, and 4 without learning rate decay

[2] Action-repeat (frame-skip) of 1, 2, and 4 with learning rate decay

[1] & [2]

[3] Action-repeat of 4 for DQN (dark blue) Dueling DQN (dark green) DDQN (brown) Dueling DDQN (turquoise)

The current hyper parameters and gradient clipping are not implemented as it is in the paper.

[4] Distributed action-repeat (frame-skip) of 1 without learning rate decay

[5] Distributed action-repeat (frame-skip) of 4 without learning rate decay

References

simple_dqn

Code for Human-level control through deep reinforcement learning

License

MIT License.

上一篇：Fully Convolutional Networks for Semantic Segmentation

下一篇：Colorful Image Colorization

用户评价

全部评价

还没有评论，说两句吧！

热门资源

TensorFlow-Course

This repository aims to provide simple and read...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
mxnet_VanillaCNN

This is a mxnet implementation of the Vanilla C...
vsepp_tensorflow

Improving Visual-Semantic Embeddings with Hard ...
DuReader_QANet_BiDAF

Machine Reading Comprehension on DuReader Usin...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com