资源算法Noisy Networks for Exploration

Noisy Networks for Exploration

2019-09-17 | |  73 |   0 |   0

NoisyNet-A3C

MIT License

NoisyNet [1] (LSTM) asynchronous advantage actor-critic (A3C) [2] on the CartPole-v1 environment. This repo has a minimalistic design and a classic control environment to enable quick investigation of different hyperparameters.

Run with python main.py <options>. Entropy regularisation can still be added by setting --entropy-weight <value>, but it is 0 by default. Run with --no-noise to run normal A3C (without noisy linear layers).

Requirements

To install all dependencies with Anaconda run conda env create -f environment.yml and use source activate noisynet to activate the environment.

Results

NoisyNet-A3C

On the whole, NoisyNet-A3C tends to be better than A3C (with or without entropy regularisation). There seems to be more variance, with both good and poor runs, probably due to "deep" exploration.

good-noisynet-a3c.png

bad-noisynet-a3c.png

NoisyNet-A3C is perhaps even more prone to performance collapses than normal A3C. Many deep reinforcement learning algorithms are still prone to this.

collapse-noisynet-a3c.png

A3C (no entropy regularisation)

A3C without entropy regularisation usually performs poorly.

a3c.png

A3C (entropy regularisation with = 0.01)

A3C with entropy regularisation usually performs a bit better than A3C without entropy regularisation, and also poor runs of NoisyNet-A3C. The performance tends to be significantly worse than the best NoisyNet-A3C runs.

a3c-entropy.png

Note that due to the nondeterminism introduced by asynchronous agents, different runs on even the same seed can produce different results, and hence the results presented are only single samples of the performance of these algorithms. Interestingly, the general observations above seem to hold even when increasing the number of processes (experiments were repeated with 16 processes). These algorithms are still sensitive to the choice of hyperparameters, and will need to be tuned extensively to get good performance on other domains.

Acknowledgements

References

[1] [Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295)
[2] [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783)

上一篇:context_encoder_pytorch

下一篇:pytorch-retraining

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...