PyTorch implementation of TRPO
This repo contains a PyTorch implementation of a Trust Region Policy Optimization agent for an environment with a discrete action space.
Environment Setup
Install conda for Python 2.7.
2.
conda create --name trpo --file requirements/conda_requirements.txt
source activate trpo
pip install -r requirements/pip_requirements.txt
Install PyTorch from source at commit eff5b8b.
Usage
python run_trpo.py --env=GYM_ENV_ID
where GYM_ENV_ID is the environment ID of the gym environment you which to train on.
Results
A game of Pong as played using the policy model learned from a TRPO agent
Plot of total reward per episode of Pong vs. episode number
Related Repos
OpenAI's Baseline implementation of parallel TRPO in TensorFlow
Ilya Kostrikov's implementation of TRPO for continuous control in PyTorch