NLP-GPT-Upsampling

This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Nystromformer implementation to approximate the full attention softmax matrix to model longer sequences in NLP language modeling tasks by a simple strided average pooling of the input text sequence to reduce the sequence length. The reduced length attention output is then upsampled back to the original sequence length using the bilinear method.

It should be noted that due to the simplicity of this implementation, the performance of the model will not be comparable to the original GPT model utilising the full attention matrix. The tradeoff is that this naive strided averaging would be able to model longer sequences as compared to the original GPT implementation.

Fig. 1: GPT Model Architecture (obtained from GPT paper)

Data

This repository includes codes to process the Movie Dialogue dataset, where the preparation of the data follows this script closely, as well as the Reddit Jokes dataset.

To prepare the data prior to training the model(s), run

python process_movie_dialogue_subword.py

for the Movie Dialogue dataset, or

python process_reddit_jokes_subword_v1.py

for the Reddit Jokes dataset.

Training and Model Inference

Having processed the data into sub-word tokens, run

python train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
python infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py

or

python train_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py
python infer_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py

to train the respective models based on the dataset loaded and perform inference of the trained model.

Using Primer-EZ Architecture

The Primer-EZ architecture is implemented in tf_ver2_gpt_primer_keras_ups.py file. The current implementation in train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py is configured to use this model.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
GPT_network.png		GPT_network.png
README.md		README.md
byte_pair_encoding.py		byte_pair_encoding.py
infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py		infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
infer_reddit_jokes_sw_tf_ver2_gpt_upsampled.py		infer_reddit_jokes_sw_tf_ver2_gpt_upsampled.py
process_movie_dialogue_subword.py		process_movie_dialogue_subword.py
process_reddit_jokes_subword_v1.py		process_reddit_jokes_subword_v1.py
tf_ver2_gpt_keras_ups.py		tf_ver2_gpt_keras_ups.py
tf_ver2_gpt_primer_keras_ups.py		tf_ver2_gpt_primer_keras_ups.py
train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py		train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
train_reddit_jokes_sw_tf_ver2_gpt_upsampled.py		train_reddit_jokes_sw_tf_ver2_gpt_upsampled.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-GPT-Upsampling

Data

Training and Model Inference

Using Primer-EZ Architecture

About

Releases

Packages

Languages

WD-Leong/NLP-GPT-Upsampling

Folders and files

Latest commit

History

Repository files navigation

NLP-GPT-Upsampling

Data

Training and Model Inference

Using Primer-EZ Architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages