Dear friends: We implemented the Convolutions to Vision Transformers (CvT) into Tensorflow Version > 2.5.
The goal was to better understand the concept and architecture. Please feel free to use and improve the model.
CvT original code GitHub: [https://github.com/microsoft/CvT]
Paper: CvT: Introducing Convolutions to Vision Transformers
Trained on CIFAR100
Data set: 60000 images and 100 object categories
Training set: Contains 50000 images (500 objects per a category)
Validation set: Contains 10000 images (100 objects per a category)
CIFAR100 was trained from scratch.
We use augmentation with an image resizing and get state-of-the-art results.
Model | Resolution | Param | Top-1 | Hardware |
CvT-1 | 72x72 | 3.5M | 59.0 | 2x RTX 2080 |
Please see option details in config.py
Options | Stage 1 | Stage 2 | Stage 3 | Remark |
Model | ||||
NUM_STAGES | - | - | - | 3 |
CLS_TOKEN | - | - | - | TRUE |
EMBEDDING | ||||
PATCH_SIZE | 6 | 3 | 3 | |
PATCH_STRIDE | 3 | 2 | 2 | |
DIM_EMBED | 32 | 64 | 128 | |
STAGE | ||||
DEPTH | 1 | 2 | 6 | No dropout |
ATTENTION | ||||
NUM_HEADS | 1 | 3 | 6 |
Before installing the dependencies you should consider using a virtual environment. It can be created by:
# activate the environment by running the generated activate
# script in <folder name> for your os. E.g. for windows activate.bat
python3 -m venv <folder name>
The necessary packages are listed in requirements.txt. They can be installed using:
pip install -r requirements.txt
For the installation of the optional CUDA drivers please refer to the tensorflow documentation.
The Model can be configured with the hyper-parameters in config/config.py.
To start the training without changing Datasets, Learning Rate or the Learning Rate Schedule just run main.py:
python main.py
If you want to change these values, open main.py with an editor and change the parameters of the train function at the bottom of the file.
model, figure = train(cifar_loader,
epochs=300,
batch_size=512,
start_weights="",
learning_rate=1e-3,
learning_rate_schedule=schedule)
-
cifar_loader
The loader of the Dataset (Consult dataloader/DataLoader.py) for more information.
-
epchos
The Number of Epochs to train for.
-
batch_size
The Number of Images per batch.
-
start_weights
The file name in the weights folder containing pre trained weights to load before starting the training.
-
learning_rate
The learning rate.
-
learning_rate_schedule
The learning rate schedule (e. g. a cosine decay)
Note that the training can be stopped at any time by focusing on the plot and holding the key 'q'.
Pressing 'h' or 'r' while focusing on the plot will resize it to fit the Data.
To test your Model call the train function found in main.py
figure = test(model, cifar_loader, number_of_images=5000, split="test", seed=None)
-
model
your trained Model.
-
cifar_loader
Dataset Loader same as in train.
-
number_of_images
Determines how many images to use for the test.
-
split
"test" or "train" the Dataset split to take images from. (usually test : )
-
seed
The Random Seed by which to choose images. If the Value is None os.urandom is used instead.