Understanding Knowledge Distillation in Neural Sequence Generation

Sequence-level knowledge distillation (KD) -- learning a student model with targets decoded from a pre-trained teacher model -- has been widely use...
Back to Top