Comparison snippet - Montezuma’s Revenge

Demonstration video to our paper “Playing hard exploration games by watching YouTube videos“, The sequence on the left is the expert video used for imitation, on the right is our learnt policy. While our agent follows the path taken by the expert, our method allows the RL agent to still optimize low-level skills, such as timing jumps.
Back to Top