Transformers in Pytorch

what are provided in this repo?

Model: Several transformer-based milestone models are reimplemented from scratch via pytorch
- Vision Transformer (CV) : AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
- Vanilla Transformer (NLP) : Attention Is All You Need
Experiments: Conduct experiments on CV/NLP benchmark, respectively
- Image Classification :
  - ImageNet1k
  - Cifar10
- Neural Machine Translation :
  - Multi30k
Pipeline: End-to-end pipeline
- Conveniently Playing : integrate data processing and model training/validation into one-stop shop pipeline
- Efficiently Training : accelerate training and evaluating via DistributeDataParallel(DDP) and Mixed Precision(fp16)
- Neatly Reading : neat file structure, easy for reading but non-trivial
  - ./script → run train/eval
  - ./model → model implementation
  - ./data → data processing

Usage

1. Env Requirements

# Conda Env
python 3.6.10
torch 1.4.0+cu100
torchvision 0.5.0+cu100
torchtext 0.5.0
spacy 3.4.1
tqdm 4.63.0

# Apex (For mix precision training)
## run `gcc --version` 
gcc (GCC) 5.4.0
## apex installation
git clone https://github.com/NVIDIA/apex
cd apex
git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0
rm -rf ./build
python setup.py install --cuda_ext --cpp_ext

# System Env 
## run `nvcc --version`
Cuda compilation tools, release 10.0, V10.0.130
# run `nvidia-smi`
Check your own gpu device status

2. Data Requirements

multi30k, cifar10 could be automatically downloaded in pipeline
imagenet1k(ILSVRC2012) need manual download (Guide for download imagenet1k)
- Wait until three files download.
  - ILSVRC2012_devkit_t12.tar.gz (2.5M)
  - ILSVRC2012_img_train.tar (138G)
  - ILSVRC2012_img_val.tar (6.3G)
- Run imagenet1k pipeline, ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar will be automatically unzipped and arranged in two directories 'data/ILSVRC2012/train' and 'data/ILSVRC2012/val'.
- But the unzip process costs more than a few hours or you can do it faster by shell anyway.

# Guide for download imagenet1k
mkdir -p data/ILSVRC2012
cd data/ILSVRC2012
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz --no-check-certificate
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar --no-check-certificate
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar --no-check-certificate

3. Run Experiments

3.1 Fine-tuning ViT on imagenet1k, cifar10

Download pretrained ViT_B_16 model parameters from official storage.

cd data
curl -o ViT_B_16.npz https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz
curl -o ViT_B_16_384.npz https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz

Before run experiments
- Set CUDA env in script/run_img_cls_task.py/__main__ according to your GPU device
- Adjust train/eval settings in script/run_img_cls_task.py/get_args() and launch the experiment

cd script

# run experiments on cifar10
# (4mins/epoch, 3.5hours totally | GPU device: P40×4)
python ./run_image_cls_task.py cifar10

# run experiments on imagenet1k
# (less than 5hours/epcoch, more than 10hours totally | GPU device: P40×4 )
python ./run_image_cls_task.py ILSVRC2012

# Tips:
# 1. Both DDP and FP16 Mixed Precision Training are adopted for accelerating
# 2. The ratio of acceleration depends on your specific GPU device

3.2 Train vanilla transformer from scratch on multi30k (en → de)

Before run experiments
- Set CUDA env in script/run_nmt_task.py/__main__ according to your GPU device
- Adjust train/eval settings in script/run_nmt_task.py/get_args() and launch the experiment

# run experiments on multi30k (small dataset ,3mins total | GPU device : P40×4 | U can also fork and adjust the pipeline and run this experiments in a small capacity gpu device)
cd script

python ./run_nmt_task.py multi30k

# Tips:
# 1. DDP is adopted for accelerating
# 1. For inference, "greedy search" and "beam search" are also included in the nmt task pipeline.

4. Results

4.1 Fine-tuning ViT on imagenet1k, cifar10,

This repo
- Imagenet1k : ACC 84.9% (result on 50,000 val set for | resolution 384 | extra label smoothing confidence 0.9 | batch size 160, nearly 15,000 training steps)
- Cifar10 : ACC 99.04% (resolution 224 | batch size 640, nearly 5500 training steps)
Comparison to official result ViT Implementation by Google

4.2 Train vanilla transformer from scratch on multi30k (en → de)

This repo
- Multi30k : BLEU 38.6 (en→de | nearly 17M #Params | batch size 512, nearly 1200 training steps)
Comparison to results in Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Reference materials for further study

Transformer Survey
- Efficient Transformers: A Survey
- A Survey of Transformers
Vanilla Transformer Component Structures
- self-attention
  - ON THE RELATIONSHIP BETWEEN SELF-ATTENTION AND CONVOLUTIONAL LAYERS
  - Online and Linear-Time Attention by Enforcing Monotonic Alignments
- multi-head
  - Are Sixteen Heads Really Better than One?
  - Multi-head or Single-head? An Empirical Comparison for Transformer Training
- feed forward network
  - Transformer Feed-Forward Layers Are Key-Value Memories
- residual connection & layer norm
  - On Layer Normalization in the Transformer Architecture
  - PowerNorm: Rethinking Batch Normalization in Transformers
- label smoothing related
Recent Transformer Milestone Work in CV

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
model		model
readme_supply		readme_supply
script		script
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers in Pytorch

what are provided in this repo?

Usage

1. Env Requirements

2. Data Requirements

3. Run Experiments

3.1 Fine-tuning ViT on imagenet1k, cifar10

3.2 Train vanilla transformer from scratch on multi30k (en → de)

4. Results

4.1 Fine-tuning ViT on imagenet1k, cifar10,

4.2 Train vanilla transformer from scratch on multi30k (en → de)

Reference materials for further study

About

Releases

Packages

Languages

TigerJeffX/Transformers-in-pytorch

Folders and files

Latest commit

History

Repository files navigation

Transformers in Pytorch

what are provided in this repo?

Usage

1. Env Requirements

2. Data Requirements

3. Run Experiments

3.1 Fine-tuning ViT on imagenet1k, cifar10

3.2 Train vanilla transformer from scratch on multi30k (en → de)

4. Results

4.1 Fine-tuning ViT on imagenet1k, cifar10,

4.2 Train vanilla transformer from scratch on multi30k (en → de)

Reference materials for further study

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages