[Andrej Karpathy] Building makemore Part 4: Becoming a Backprop Ninja

🎯 Загружено автоматически через бота: 🚫 Оригинал видео: 📺 Данное видео принадлежит каналу «Andrej Karpathy» (@AndrejKarpathy). Оно представлено в нашем сообществе исключительно в информационных, научных, образовательных или культурных целях. Наше сообщество не утверждает никаких прав на данное видео. Пожалуйста, поддержите автора, посетив его оригинальный канал. ✉️ Если у вас есть претензии к авторским правам на данное видео, пожалуйста, свяжитесь с нами по почте support@, и мы немедленно удалим его. 📃 Оригинальное описание: We take the 2-layer MLP (with BatchNorm) from the previous video and backpropagate through it manually without using PyTorch autograd’s (): through the cross entropy loss, 2nd linear layer, tanh, batchnorm, 1st linear layer, and the embedding table. Along the way, we get a strong intuitive understanding about how gradients flow backwards through the compute graph and on the level of efficient Tensors, not just individual scalars like in micrograd. This helps build competence and intuition around how neural nets are optimized and sets you up to more confidently innovate on and debug modern neural networks. !!!!!!!!!!!! I recommend you work through the exercise yourself but work with it in tandem and whenever you are stuck unpause the video and see me give away the answer. This video is not super intended to be simply watched. The exercise is here: !!!!!!!!!!!! Links: makemore on github: jupyter notebook I built in this video: collab notebook: my website: my twitter: our Discord channel: Supplementary links: Yes you should understand backprop: BatchNorm paper: Bessel’s Correction: Bengio et al. 2003 MLP LM Chapters: intro: why you should care & fun history starter code exercise 1: backproping the atomic compute graph brief digression: bessel’s correction in batchnorm exercise 2: cross entropy loss backward pass exercise 3: batch norm layer backward pass exercise 4: putting it all together outro

12 views