This repository contains a from-scratch implementation of the Transformer model, as described in the paper "Attention is All You Need" by Vaswani et al. The Transformer model is a type of attention-based model that has been widely used in various Natural Language Processing (NLP) tasks.
- Customizable model parameters, including the number of layers, the number of attention heads, and the dimension of the model.
- Includes both the encoder and decoder components of the Transformer model.
- Uses PyTorch for efficient computation and gradient calculations.
- Includes Positional Encoding, Multi-Head Attention, and Feed Forward Network modules.
This project is licensed under the MIT License. See the LICENSE file for details.