https://arxiv.org/abs/2002.10957

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou)

트랜스포머의 마지막 self attention을 distill하고 self attention처럼 value 사이의 내적을 구해서 그 내적 사이에서도 distill하는 방법. 트랜스포머를 경량화하다보니 그에 맞는 흥미로운 시도들이 나오는 듯.

#language_model #distillation #lightweight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

200225 MiniLM.md

200225 MiniLM.md

Files

200225 MiniLM.md

Latest commit

History

200225 MiniLM.md

File metadata and controls