Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 458 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 458 Bytes

GPT2 Pytorch

Extremely simple and understandable GPT2 implementation with minor tweaks.

Advantages

  • You can train even the subword tokenizer, good for non-English languages.
  • Fast optimized code, enough a single GTX 2080ti card
  • Easy to understand, solid code
  • Easy to extend for new experiments

Supported extra features

  • Lamb optimizer
  • Mixed precision training, the important layers still remained in fp32.
  • sin, cos positional encoding