Self-Tuning Networks: Amortizing the Hypergradient Computation for Hyperparameter Optimization
Optimization of many deep learning hyperparameters can be formulated as a bilevel optimization problem. While most black-box and gradient-based approaches require many independent training runs, we aim to adapt hyperparameters online as the network trains. The main challenge is to approximate the response Jacobian, which captures how the minimum of the inner objective changes as the hyperparameters are perturbed. To do this, we introduce the self-tuning network (STN), which fits a hypernetwork to approximate the best response function in the vicinity of the current hyperparameters. Differentiating through the hypernetwork lets us efficiently approximate the gradient of the validation loss with respect to the hyperparameters. We train the hypernetwork and hyperparameters jointly. Empirically, we can find hyperparameter settings competitive with Bayesian Optimization in a single run of training, and in some cases find hyperparameter schedules that outperform any fixed hyperparameter value.
Roger Grosse i
10 views
32
7
6 months ago 02:43:42 1
Coming Great Depression? - How To Survive & Thrive The Great Reset | Arthur Hayes
7 months ago 01:12:40 1
One Hour of Mind-Blowing Scientific Theories on Conscious Universe
8 months ago 00:03:18 2
Boswell Sisters - I Hate Myself (For Being Mean to You)
1 year ago 01:00:51 1
Is the Universe Alive? - Kabbalah Explained Simply
1 year ago 00:00:00 1
528 Hz - Connect with the person you love: A miracle of love will happen, he (she) will be with you
1 year ago 00:10:41 1
“David Icke’s Revelation: The Shocking Prophecy for 2025 That Threatens Humanity’s Existence!“
1 year ago 00:08:14 17
OpenAI Eve Humanoid Robot: The Most Versatile and Autonomous Humanoid Robot Ever Created