Metatrace Actor-Critic: Online Step-Size Tuning by Meta-Gradient Descent
for Reinforcement Learning Control
Abstract
Reinforcement learning (RL) has had many successes, but significant hyperparameter tuning is
commonly required to achieve good performance.
Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety
of techniques exist to combat this — most notably experience replay or the use of parallel actors. These techniques stabilize learning by making the RL problem more similar to the supervised
setting. However, they come at the cost of moving
away from the RL problem as it is typically formulated, that is, a single agent learning online without
maintaining a large database of training examples.
To address these issues, we propose Metatrace, a
meta-gradient descent based algorithm to tune the
step-size online. Metatrace leverages the structure
of eligibility traces, and works for both tuning a
scalar step-size and a respective step-size for each
parameter. We empirically evaluate Metatrace for
actor-critic on the Arcade Learning Environment.
Results show Metatrace can speed up learning, and
improve performance in non-stationary settings.