资源算法tf-a3c-gpu

tf-a3c-gpu

2020-01-10 | |  28 |   0 |   0

tf-a3c-gpu

Tensorflow implementation of A3C algorithm using GPU (haven't tested, but it would be also trainable with CPU).

On the original paper, "Asynchronous Methods for Deep Reinforcement Learning", suggests CPU only implementations, since environment can only be executed on CPU which causes unevitable communication overhead between CPU and GPU otherwise.

However, we can minimize communication up to 'current state, reward' instead of whole parameter sets by storing all parameters for a policy and a value network inside of a GPU. Furthermore, we can achieve more utilization of a GPU by having multiple agent for a single thread. The current implementation (and with minor tuned-hyperparameters) uses 4 threads while each has 64 agents. With this setting, I was able to achieve 2 times of speed up. (huh, a little bit disappointing, isn't it?)

Therefore, this implementation is not quietly exact re-implementation of the paper, and the effect of having multiple batch for each thread is worth to be examined. (different # of threads and agents per thread). (However, I am still curious about how A3C can achieve such a nice results. Is the asynchrnous update is the only key? I couldn't find other explanations of effectiveness of this method.) Yet, it gave me a quiet competitive result (3 hours of training on breakout-v0 for reasonable playing), so it could be a good base for someone to start with.

Enjoy :)

Requirements

  • Python 2.7

  • Tensorflow v1.2

  • OpenAI Gym v0.9

  • scipy, pip (for image resize)

  • tqdm(optional)

  • better-exceptions(optional)

Training Results

  • Training on Breakout-v0 is done with nVidia Titan X Pascal GPU for 28 hours

  • With the hyperparameter I used, one step corresponds to 64 * 5 frames of inputs(64 * 5 * average 3 game framse).

  • Orange Line: with reward clipping(reward is clipped to -1 to 1) + Gradient Normalization, Purple Line: wihtout them

    • by the number of steps

by the number of episodes

by the time

Training from scratch

  • All the hyperparmeters are defined on a3c.py file. Change some hyperparameters as you want, then execute it.

python ac3.py

Validation with trained models

  • If you want to see the trained agent playing, use the command:

python ac3-test.py --model ./models/breakout-v0/last.ckpt --out /tmp/result

Notes & Acknowledgement


上一篇:Dist-A3C

下一篇:videoPrediction2

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...