资源算法parallel-trpo

parallel-trpo

2020-01-16 | |  30 |   0 |   0

parallel-trpo

A parallel implementation of Trust Region Policy Optimization on environments from OpenAI gym

Now includes hyperparaemter adaptation as well! More more info, check my post on this project.

I'm working towards the ideas at this openAI research request. The code is based off of this implementation.

I'm currently working together with Danijar on writing an updated version of this preliminary paper, describing the multiple actors setup.

How to run:

# This just runs a simple training on Reacher-v1.
python main.py

# For the commands used to recreate results, check trials.txt

Parameters:

--task: what gym environment to run on
--timesteps_per_batch: how many timesteps for each policy iteration
--n_iter: number of iterations
--gamma: discount factor for future rewards_1
--max_kl: maximum KL divergence between new and old policy
--cg_damping: damp on the KL constraint (ratio of original gradient to use)
--num_threads: how many async threads to use
--monitor: whether to monitor progress for publishing results to gym or not


上一篇: pytorch-trpo

下一篇:me-trpo

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...