Do Vision Transformers See Like Convolutional Neural Networks? | Paper Explained
In this video I cover the “Do Vision Transformers See Like Convolutional Neural Networks?“ paper. They dissect ViTs and ResNets and show the differences in the features learned as well as what contributes to those differences (like the amount of data used, skip connections, etc.).
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ Paper:
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00 Intro
00:45 Contrasting features in ViTs vs CNNs
06:45 Global vs Local receptive fields
13:55 Data matters, mr. obvious
17:40 Contrasting receptive fields
20:30 Data flow through CLS vs spatial tokens
23:30 Skip connections matter a lot in ViTs
24:20 Spatial inform
1 view
26
10
2 days ago 00:05:10 1
How to Invest in Presale Crypto and Maximize Your Profits!
3 days ago 00:34:18 1
Why Saudi Arabia is Building a $1 Trillion City in the Desert
3 days ago 00:05:12 1
Yanis Varoufakis: DiEM25 had predicted Europe’s decline by 2025. Plus what we must do next
3 days ago 00:03:38 1
Wonderland Skies - Connie Talbot (Music Video)
1 week ago 00:08:44 1
The Future of Cinema: How AI Filmmaking Is Changing Filmmaking Forever!