UCI AI/ML Seminar Series
Roy Fox
Assistant Professor
Department of Computer Science
University of California, Irvine
Curiously effective ensemble and double-oracle reinforcement-learning methods
Ensemble methods for reinforcement learning have gained attention in recent years, due to their ability to represent model uncertainty and use it to guide exploration and to reduce value estimation bias. We present MeanQ, a very simple ensemble method with improved performance, and show how it reduces estimation variance enough to operate without a stabilizing target network. Curiously, MeanQ is th
...eoretically *almost* equivalent to a non-ensemble state-of-the-art method that it significantly outperforms, raising questions about the interaction between uncertainty estimation, representation, and resampling.
In adversarial environments, where a second agent attempts to minimize the first’s rewards, double-oracle (DO) methods grow a population of policiesShow more