RL-Adventure-2: Policy Gradients

PyTorch tutorial of: actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / hindsight experience replay

The deep reinforcement learning community has made several improvements to the policy gradient algorithms. This tutorial presents latest extensions in the following order:

Advantage Actor Critic (A2C)

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Proximal Policy Optimization Algorithms

Sample Efficient Actor-Critic with Experience Replay

Continuous control with deep reinforcement learning

Addressing Function Approximation Error in Actor-Critic Methods

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Generative Adversarial Imitation Learning

Hindsight Experience Replay

If you get stuck…

Remember you are not stuck unless you have spent more than a week on a single algorithm. It is perfectly normal if you do not have all the required knowledge of mathematics and CS.
Carefully go through the paper. Try to see what is the problem the authors are solving. Understand a high-level idea of the approach, then read the code (skipping the proofs), and after go over the mathematical details and proofs.