This short tutorial covers the basics of automatic differentiation, a set of techniques that allow us to efficiently compute derivatives of functions implemented as programs. It is based in part on Baydin et al., 2018: Automatic Differentiation in Machine Learning: A Survey ().
Errata:
At 6:23 in bottom right, it should be v̇6 = v̇5*v4 v̇4*v5 (instead of “-“).
Additional references:
Griewank & Walther, 2008: Evaluating Derivatives: Principles and Techniques
of Algorithmic Differentiation ()
Adams, 2018: COS 324 – Computing Gradients with Backpropagation ()
Grosse, 2018: CSC 321 – Lecture 10: Automatic Differentiation (~rgrosse/courses/csc321_2018/slides/)
Pearlmutter, 1994: Fast exact multiplication by the Hessian (~barak/papers/)
Alleviating memory requirements of reverse mode:
Griewank & Walther, 2000: Algorithm 799: revolve: an
implementation of checkpointing for the reverse or adjoint mode of computational differentiation ()
Dauvergne & Hascoët, 2006. The data-flow equations of checkpointing in
reverse automatic differentiation ()
Chen, T et al., 2016: Training Deep Nets with Sublinear Memory Cost ()
Gruslys et al., 2016: Memory-efficient Backpropagation
Through Time ()
Siskind & Pearlmutter. Divide-and-conquer checkpointing for arbitrary programs with no user annotation ()
Oktay et al., 2020: Randomized Automatic Differentiation ()
Example software libraries using various implementation routes:
Source code transformation:
Tangent –
Zygote –
Operator overloading:
Autograd –
Jax –
PyTorch –
Graph-based w/ embedding mini lanugage:
TensorFlow –
Special thanks to Ryan Adams, Alex Beatson, Geoffrey Roeder, Greg Gundersen, and Deniz Oktay for feedback on this video.
Some of the animations in this video were created with 3Blue1Brown’s manim library ().
Music: Trinkets by Vincent Rubinetti
Links:
YouTube:
Twitter:
Homepage:
If you’d like to help support the channel (completely optional), you can donate a cup of coffee via the following:
Venmo:
PayPal: