This repo contains implementation of various forms of attention:
- Location based attention
- Content based dot product
- Content based concatenation
- Content based general attention
- Pointer networks
and finally
Each of these sequence to sequence models is trained to learn how to sort a shuffled array of numbers from 1
to N
. The code to generate this data is here.
There is a considerable improvement if an attention based model is used versus the no attention model.
All the models and the data loader are defined in code/
.
-
Each model is defined in a separate file. The file containing a model also contains
train
andtest
functions which are self-explanatory. -
Output logs are stored under
training_outputs/
-
Attention weights can be visualized using the code in the notebook Visualizing attention.