Skip to content
/ betsi Public

A light implementation of the 2017 Google paper 'Attention is all you need'.

License

Notifications You must be signed in to change notification settings

taxborn/betsi

Repository files navigation

BETSI

arXiv paper

A light implementation of the 2017 Google paper 'Attention is all you need'. BETSI is the name of the model, which is a recursive acronym standing for BETSI: English to shitty Italian, as the training time I allowed on my graphics card did not give enough time for amazing results.

For this implementation I will implement a translation from English to Italian, as Tranformer models are exceptional at language translation and this seems to be a common use of light implementations of this paper.

The dataset I will be using is the opus books dataset which is a collection of copyright free books. The book content of these translations are free for personal, educational, and research use. OPUS language resource paper.

Notes

I'm creating notes as I go, which can be found in NOTES.md.

Transformer model architecture

Transformer model

Requirements

There is a requirements.txt that has the packages needed to run this. I used PyTorch with ROCm as this sped up training A LOT. Training this model on CPU on my laptop takes around 5.5 hours per epoch, while training the model on GPU on my desktop takes around 13.5 minutes (24.4 times faster!).

TODO and tenative timeline:

  • Input Embeddings
  • Positional Encoding
  • Layer Normalization - Due by 11/1
  • Feed forward
  • Multi-Head attention
  • Residual Connection
  • Encoder
  • Decoder - Due by 11/8
  • Linear Layer
  • Transformer
  • Tokenizer - Due by 11/15
  • Dataset
  • Training loop
  • Visualization of the model - Due by 11/22
  • Install AMD RocM to train with GPU - Attempt to do by end

References used

About

A light implementation of the 2017 Google paper 'Attention is all you need'.

Topics

Resources

License

Stars

Watchers

Forks