https://arxiv.org/abs/2005.10419

Why distillation helps: a statistical perspective (Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar)

왜 KD가 도움이 되는가? onehot을 사용하는 경우보다 레이블의 확률 분포를 사용해서 리스크를 추정하는 경우가 추정값의 분산이 더 낮기 때문이라고 주장. 이는 calibration이 중요할 수 있다는 것을 시사. 이 아이디어를 활용해 multiclass retrieval에 distill을 적용.

#distillation #calibration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

200521 Why distillation helps.md

200521 Why distillation helps.md

Files

200521 Why distillation helps.md

Latest commit

History

200521 Why distillation helps.md

File metadata and controls