https://arxiv.org/abs/2004.05686
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models (Subhabrata Mukherjee, Ahmed Awadallah)
TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER (Subhabrata Mukherjee, Ahmed Awadallah)
multilingual ner 모델을 bilstm student에 distill하기. 이정도면 거의 모델을 꾹꾹 눌러 담은 수준인 듯.
#bert #distillation #lightweight