Then, we ensure that we evaluate on a separate set of hold out random
seeds for the environment (which should be different than the test set
and training seed). For MuJoCo environments where the random seed has an
effect, we can simply set the random seed before a rollout. In Atari,
we would have to create a new validation set of rollouts perhaps with
different human starts.
We create a file with functions for evaluation:
eval.py
And then simply add a registration for the evaluation after training our algorithm:
Notice that this will search over the learning rates and gamma
values, while setting the log directory name to be the hashed trial name
provided in the orion database.