Tensorflow implementation of A3C algorithm using GPU (haven't tested, but it would be also trainable with CPU).
On the original paper, "Asynchronous Methods for Deep Reinforcement Learning",
suggests CPU only implementations, since environment can only be executed on CPU which causes unevitable communication
overhead between CPU and GPU otherwise.
However, we can minimize communication up to 'current state, reward'
instead of whole parameter sets by storing all parameters
for a policy and a value network inside of a GPU. Furthermore, we can
achieve more utilization of a GPU by having multiple agent for a single
thread.
The current implementation (and with minor tuned-hyperparameters) uses 4
threads while each has 64 agents. With this setting, I was able to
achieve 2 times of speed up. (huh, a little bit disappointing, isn't
it?)
Therefore, this implementation is not quietly exact re-implementation
of the paper, and
the effect of having multiple batch for each thread is worth to be
examined. (different # of threads and agents per thread).
(However, I am still curious about how A3C can achieve such a nice
results. Is the asynchrnous update is the only key? I couldn't find
other explanations of effectiveness of this method.)
Yet, it gave me a quiet competitive result (3 hours of training on
breakout-v0 for reasonable playing), so it could be a good base for
someone to start with.
Enjoy :)
Requirements
Python 2.7
Tensorflow v1.2
OpenAI Gym v0.9
scipy, pip (for image resize)
tqdm(optional)
better-exceptions(optional)
Training Results
Training on Breakout-v0 is done with nVidia Titan X Pascal GPU for 28 hours
With the hyperparameter I used, one step corresponds to 64 * 5 frames of inputs(64 * 5 * average 3 game framse).
Orange Line: with reward clipping(reward is clipped to -1 to 1) + Gradient Normalization, Purple Line: wihtout them