资源算法async-deep-rl

async-deep-rl

2019-12-19 | |  50 |   0 |   0

What is in this repo?

Join the chat at https://gitter.im/traai/async-deep-rl

A Tensorflow-based implementation of all algorithms presented in Asynchronous Methods for Deep Reinforcement Learning.

This implementation uses processes instead of threads to achieve real concurrency. Each process has a local replica of the network(s) used, implemented in Tensorflow, and runs its own Tensorflow session. In addition, a copy of the network parameters are kept in a shared memory space. At runtime, each process uses its own local network(s) to choose actions and compute gradients (with Tensorflow). The shared network parameters are updated periodically in an asynchronous manner, by applying the grads obtained from Tensorflow into the shared memory space.

Both ALE and Open AI GYM environments can be used.

#Results The graphs below show the reward achieved in different games by one individual actor during training (i.e., not averaging over several runs, and over all actors, as in the paper). All experiments were run on a rather old machine equipped with 2 Xeon E5540 quad-core 2.53GHz CPUs (16 virtual cores) and 47 Gb RAM.

Boxing-v0 (from OpenAI Gym), A3C, 100 actors, lr=0.0007, 80M steps in 59h, 31m:boxing_v0.pngAs you can see, the score achieved is much higher than the one reported in the paper. That is due to the effect of having 100 actors. So concurrently exploring the environment in different ways definitely helps with the learning process and makes experience replay not needed. Note, however, that the performance in terms of training time is slightly worse than with fewer actors. This is probably due to our implementation, which is not optimal, and to the limitations of the machine we used.

Pong (from ALE), A3C, 16 actors, lr=0.0007, 80M steps in 48h:pong.png

Beam Rider (from ALE), A3C, 16 actors, lr=0.0007, 80M steps in 45h, 25min:beamrider.png

Breakout (from ALE), A3C, 15 actors, lr=0.0007, 80M steps in 53h, 22m:breakout.png

How to run the algorithms (MacOSX for now)?

A number of hyperparameters can be specified. Default values have been chosen according to the paper and information received by @muupan from the authors. To see a list, please run:

python main.py -h

If you just want to see the code in action, you can kick off training with the default hyperparameters by running:

python main.py pong --rom_path ../atari_roms/

To run outside a docker, you need to install some dependencies:

  • Tensorflow

  • OpenAI Gym

  • The Arcade Learning Environment (ALE).(Note that OpenAI Gym uses ALE internally, so you could use that version. This would require some hacking.)

  • Scikit-image

  • Open CV v2, for standalone ALE (It should be possible to change the code in emulator.py to use Scikit-image instead of CV2. Indeed, CV2 might slow things down)

To run inside a docker:

(1) Clone this repo at ~/some-path.

(2) Make sure your machine has docker installed. Follow instructions [here] (https://docs.docker.com/toolbox/toolbox_install_mac/) if not. [These] (https://docs.docker.com/toolbox/toolbox_install_windows/) instructions may work for Windows.

(3) Make sure you have xquartz installed in order to visualise game play. Do the following in a separate terminal window:

$ brew cask install --force xquartz
$ open -a XQuartz
$ socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:"$DISPLAY"

(4) Get our docker image containing all dependencies to run the algorithms and to visualise game play.

$ docker pull restrd/tensorflow-atari-cpu

(5) Run the docker image. This will mount your home folder to /your-user-nameinside the container. Be sure to give a name to the container:<container-name>

$ docker run -d -p 8888:8888 -p 6006:6006 --name "<container-name>" -v ~/:/root/$usr -e DISPLAY=$(ifconfig vboxnet0 | awk '$1 == "inet" {gsub(//.*$/, "", $2); print $2}'):0 -it docker.io/restrd/tensorflow0.10-atari-cpu

(6) Shell into the container.

$ docker exec -it <container-name> /bin/bash

(7) Go to the algorithms folder (/your-user-name/some-path/async-deep-rl/algorithms) and choose which algorithm to run via the configuration options in main.py.

(8) If you want to run the algorithms using Open AI GYM with 16 processes and visualize the games, e.g.:

$ python main.py BeamRider-v0 --env GYM -n 16 -v 1

Running TensorBoard

You can also run TensorBoardto visualise losses and game scores.

(1) Configure port forwarding rules in [VirtualBox] (https://www.virtualbox.org/). Go to your running virtual machine's Settings>Network>Port Forwarding, and add a new rule (see row starting with tb in pic).

tb.png

(2) Run tensorboard from within the container:

$ tensorboard --logdir=/tmp/summary_logs/ &

(3) If not (1), get the ip address of your docker host running inside of [VirtualBox] (https://www.virtualbox.org/). Go to http://<docker-host-ip>:6006

If (1), go to http://127.0.0.1:6006


上一篇:seq2seq-signal-prediction

下一篇:asyncRL

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...