资源算法PPO

PPO

2019-09-20 | |  35 |   0 |   0
  1. Add entropy term to encourage exploration

  2. GAE

  3. Distributional

  4. Other environments

  5. Bigger -> SLower nets

  6. The exploration noise causes NAN gradients, thus NAN outputs

  7. Need experience replay because it's OBVIOUSLY forgetting stuff from the past.

  8. Use OpenAI examples

  9. Combine 2 nets into one -> Works -> Learns a bit slower I think

  10. Tuned hyper-parameters, specifically the size of roll-outs, number of updates and batch size

  11. Next step -> Try GAE estimation

  12. After -> Train in distributed setting with harder environments

  13. Compare to OpenAI baseline

  14. Incorporate into StarCraft


上一篇:geometric-matching

下一篇:chainer-fluid

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...