Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 498 Bytes

210225 SparseBERT.md

File metadata and controls

7 lines (4 loc) · 498 Bytes

https://arxiv.org/abs/2102.12871

SparseBERT: Rethinking the Importance Analysis in Self-attention (Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok)

sparse attention 밀어붙이기. bert 프리트레이닝 상황에서 sparsity 조건 아래 attention map을 최적화해서 정말로 필요한 attention pattern만 뽑아보기. attention map의 diagonal element가 별로 필요하지 않다는 것을 발견.

#pretraining #attention #sparse_attention #bert