What Do Neural Networks Really Learn? Exploring the Brain of an AI Model
Neural networks have become increasingly impressive in recent years, but there’s a big catch: we don’t really know what they are doing. We give them data and ways to get feedback, and somehow, they learn all kinds of tasks. It would be really useful, especially for safety purposes, to understand what they have learned and how they work after they’ve been trained. The ultimate goal is not only to understand in broad strokes what they’re doing but to precisely reverse engineer the algorithms encoded in their parameters. This is the ambitious goal of mechanistic interpretability. As an introduction to this field, we show how researchers have been able to partly reverse-engineer how InceptionV1, a convolutional neural network, recognizes images.
▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
This topic is truly a rabbit hole. If you want to learn more about this important research and even contribute to it, check out this list of sources about mechanistic interpretability and interpretability in general we’ve compiled for you:
On Interpreting InceptionV1:
Feature visualization:
Zoom in: An Introduction to Circuits:
The Distill journal contains several articles that try to make sense of how exactly InceptionV1 does what it does:
OpenAI’s Microscope tool lets us visualize the neurons and channels of a number of vision models in great detail:
Here’s OpenAI’s Microscope tool pointed on layer Mixed3b in InceptionV1:
Activation atlases:
More recent work applying SAEs to InceptionV1:
Transformer Circuits Thread, the spiritual successor of the circuits thread on InceptionV1. This time on transformers:
In the video, we cite “Toy Models of Superposition“:
We also cite “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning“:
More recent progress:
Mapping the Mind of a Large Language Model:
Press:
Paper in the transformers circuits thread:
Extracting Concepts from GPT-4:
Press:
Paper:
Browse features:
Language models can explain neurons in language models (cited in the video):
Press:
Paper:
View neurons:
Neel Nanda on how to get started with Mechanistic Interpretability:
Concrete Steps to Get Started in Transformer Mechanistic Interpretability:
Mechanistic Interpretability Quickstart Guide:
200 Concrete Open Problems in Mechanistic Interpretability:
More work mentioned in the video:
Progress measures for grokking via mechanistic interpretability:
Discovering Latent Knowledge in Language Models Without Supervision:
Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning:
▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, MERCH▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🟠 Patreon:
🔵 Channel membership:
🟢 Merch:
🟤 Ko-fi, for one-time and recurring donations:
▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Discord:
Reddit:
X/Twitter:
▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
AAAA you don’t fit in the description this time! But we thank you from the bottom of our hearts. All of you, in this Google Doc:
▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
All the good doggos who worked on this video:
1 view
98
40
2 months ago 00:04:57 1
The Crypto Bull Run Hasn’t Even Started Yet!
2 months ago 00:00:49 1
🍩Blender motion tracking test | vfx with Donuts
2 months ago 00:03:24 1
Falling In Reverse - “Watch The World Burn“
2 months ago 00:18:56 1
Rethinking Climate Change. The path to a 90% emissions reduction by 2035.
2 months ago 00:11:49 1
Final DooM TNT EVILUTION - BANGER COVER MASHUP
2 months ago 00:23:04 1
Why the acog is ARMY, MARINE, and APOCALYPSE approved.
2 months ago 00:04:08 1
Mark Knopfler ~ Two Pairs Of Hands
2 months ago 00:53:47 1
Myths BUSTED! Brave German Journalist EXPOSES Ukraine/NATO War Lies.
2 months ago 00:03:34 1
K/DA - VILLAIN ft. Madison Beer and Kim Petras (Official Concept Video - Starring Evelynn)
2 months ago 11:54:56 1
Tibetan Flute Healing Method - Just Listen For 1 Minute, Eliminating Depression, Stress and Anxiety
2 months ago 01:08:21 1
Xenia Torino - Live @ Kiss FM Ukraine / Melodic Techno & Progressive House Mix
2 months ago 00:04:31 1
What Is Love - Haddaway With Lyrics
2 months ago 00:29:00 1
watch out for politicians, they can be devious #juneslater63 #ukpoliticsuncovered
2 months ago 00:10:37 1
GoPro: Let Me Take You To The Mountain
2 months ago 00:02:06 1
𝓟𝓤𝓑𝓛𝓘𝓒 𝓐𝓟𝓟𝓔𝓐𝓛 𝓯𝓽 𝓐𝓨𝓔$𝓗𝓐 𝓔𝓡𝓞 -- 𝓝𝓪𝓴𝓮𝓭 (𝓒𝓵𝓮𝓪𝓷)
2 months ago 00:03:28 1
Matthew Way - Puzzle
2 months ago 00:21:59 1
Restoring a Vintage Jet Pilot’s Survival Knife and Making a Leather Sheath
2 months ago 00:04:23 1
Candy Dulfer & David A. Stewart - Lily Was Here
2 months ago 00:04:30 1
Haddaway - What Is Love (Dance Compilation)
2 months ago 00:10:36 1
NotStock Life - Our 1964 Corvair Van Comes Home + Octane and Iron What’s in the Shop
2 months ago 00:04:12 1
Pokemon GO Spoofing iOS & Android - How to Play Pokemon GO Without Moving From Home 2024
2 months ago 00:00:00 1
Ibiza Summer Mix 2024 🍓 Best Of Tropical Deep House Music Chill Out Mix 2024🍓Chillout Lounge
2 months ago 00:00:00 1
Ibiza Summer Mix 2024 🍓 Best Of Tropical Deep House Music Chill Out Mix 2023 🍓 Chillout Lounge