Welcome to the
This repository is based on the following codebases:
- Picoclvr https://fleuret.org/git/picoclvr
- NanoGPT https://github.com/karpathy/nanoGPT
- With an additionnal KV cache implementation from Vincent Micheli and Eloi Alonso, from the IRIS codebase https://github.com/eloialonso/iris
Detailed instructions are provided below, but here are some one-liners to get you started: It should work given that the environment is set up correctly. More details are provided in the following sections.
Install and text modeling on CPU:
git clone git@github.com:idiap/sigma-gpt.git
cd sigma-gpt
git submodule update --init --recursive
cd text
(cd nanoGPT/data/shakespeare_char/; python prepare.py)
python train.py nanoGPT/config/train_shakespeare_char.py --max_iters=20000 --device=cpu
Then you can evaluate the model with:
python sample.py nanoGPT/config/train_shakespeare_char.py --device=cpu --max_tokens=255 --verbose=True
(remove the --device=cpu
if you have a GPU, it should work with mps
on recent Mac as well)
The implementation is adapted from two well-known codebases: picoclvr
from François Fleuret and nanoGPT
from Andrej Karpathy. We organized the code so that the two codebases are imported as submodules.
First, clone the repository:
git clone ...
Then set up the submodules:
git submodule update --init --recursive
The environment.yml file contains all the required dependencies. You should be able to create a working environment by running this command in the main folder:
conda env create -n sigma-gpt -f environment.yml
For testing the model with fast training and evaluation on CPU, you can use the Shakespeare dataset. The model is trained on the character level.
First, go to the text folder:
cd text
And prepare the dataset:
(cd nanoGPT/data/shakespeare_char/; python prepare.py)
Then you can train with (remove the --device=cpu
if you have a GPU, it should work with mps
on recent Mac as well)
python train.py nanoGPT/config/train_shakespeare_char.py --device=cpu
Evaluation:
python sample.py nanoGPT/config/train_shakespeare_char.py --device=cpu --max_tokens=255 --verbose=True
with verbose set to True
you can see the generation on the terminal.
For larger pipelines, you can try:
Training:
python train.py
python sample.py
These commands train the model and output results for each epoch.
main.py
contains many arguments that can be adapted to change the dataset, model size, number of layers, results folder, etc.
First, go to the non-nlp folder:
cd non-nlp
Maze:
python main.py --task maze --training_strategy="shuffle"
Vertical:
python main.py --task vertical --training_strategy="shuffle"
N.B: the air traffic dataset is not publicly available yet, the procedure is ongoing, and the dataset will be available soon. Reach out to the author for more information. Once it is available it will be linked here.
This repository contains rather small models, running without problem in a few hours on modest GPUs.
This software is distributed under the LGPL-3.0 license. See the LICENSE file for more details.
This code was developed as a part of the Innosuisse MALAT: Machine Learning for Air Traffic project, which is a partnership between SkySoft ATM and the Idiap Research Institute.
Main research partner: Pr. François Fleuret (UNIGE)
Project manager: Didier Berling (SkySoft ATM)
Author: Arnaud Pannatier arnaud.pannatier@idiap.ch (Idiap Research Institute).
For any questions/remarks about this work or this research, feel free to contact the author.
If you use this code in your research, please cite the following paper:
@misc{pannatier2024sigmagpts,
title={{\sigma}-GPTs: A New Approach to Autoregressive Models},
author={Arnaud Pannatier and Evann Courdier and François Fleuret},
year={2024},
eprint={2404.09562},
archivePrefix={arXiv},
primaryClass={id='cs.LG'}
}