Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

#ai #attention #transformer #deeplearning Transformers are famous for two things: Their superior performance and their insane requirements of compute and memory. This paper reformulates the attention mechanism in terms of kernel functions and obtains a linear formulation, which reduces these requirements. Surprisingly, this formulation also surfaces an interesting connection between autoregressive transformers and RNNs. OUTLINE: 0:00 - Intro & Overview 1:35 - Softmax Attention & Transformers 8:40 - Quadra
Back to Top