Skip to content

Convolutional visions Transformer - implemented for Tensorflow

Notifications You must be signed in to change notification settings

cyborg-ai-ch/CvT-tf

Repository files navigation

CvT Tensorflow Implementation

Dear friends: We implemented the Convolutions to Vision Transformers (CvT) into Tensorflow Version > 2.5. The goal was to better understand the concept and architecture. Please feel free to use and improve the model.
CvT original code GitHub: [https://github.com/microsoft/CvT]
Paper: CvT: Introducing Convolutions to Vision Transformers

Our Implementation Schema

Testing implementation

Trained on CIFAR100
Data set: 60000 images and 100 object categories
Training set: Contains 50000 images (500 objects per a category)
Validation set: Contains 10000 images (100 objects per a category)

Results

CIFAR100 was trained from scratch.
We use augmentation with an image resizing and get state-of-the-art results.

Model Resolution Param Top-1 Hardware
CvT-1 72x72 3.5M 59.0 2x RTX 2080

Please see option details in config.py

Options Stage 1 Stage 2 Stage 3 Remark
Model
NUM_STAGES - - - 3
CLS_TOKEN - - - TRUE
EMBEDDING
PATCH_SIZE 6 3 3
PATCH_STRIDE 3 2 2
DIM_EMBED 32 64 128
STAGE
DEPTH 1 2 6 No dropout
ATTENTION
NUM_HEADS 1 3 6

Usage

Installation

Before installing the dependencies you should consider using a virtual environment. It can be created by:

# activate the environment by running the generated activate
# script in <folder name> for your os. E.g. for windows activate.bat
python3 -m venv <folder name>

The necessary packages are listed in requirements.txt. They can be installed using:

pip install -r requirements.txt

For the installation of the optional CUDA drivers please refer to the tensorflow documentation.

Configuration

The Model can be configured with the hyper-parameters in config/config.py.

Training

To start the training without changing Datasets, Learning Rate or the Learning Rate Schedule just run main.py:

python main.py 

If you want to change these values, open main.py with an editor and change the parameters of the train function at the bottom of the file.

model, figure = train(cifar_loader,
                      epochs=300,
                      batch_size=512,
                      start_weights="",
                      learning_rate=1e-3,
                      learning_rate_schedule=schedule)
Training Parameters:
  • cifar_loader

    The loader of the Dataset (Consult dataloader/DataLoader.py) for more information.

  • epchos

    The Number of Epochs to train for.

  • batch_size

    The Number of Images per batch.

  • start_weights

    The file name in the weights folder containing pre trained weights to load before starting the training.

  • learning_rate

    The learning rate.

  • learning_rate_schedule

    The learning rate schedule (e. g. a cosine decay)

Note that the training can be stopped at any time by focusing on the plot and holding the key 'q'.

Pressing 'h' or 'r' while focusing on the plot will resize it to fit the Data.

Testing

To test your Model call the train function found in main.py

figure = test(model, cifar_loader, number_of_images=5000, split="test", seed=None)
Test Parameters
  • model

    your trained Model.

  • cifar_loader

    Dataset Loader same as in train.

  • number_of_images

    Determines how many images to use for the test.

  • split

    "test" or "train" the Dataset split to take images from. (usually test : )

  • seed

    The Random Seed by which to choose images. If the Value is None os.urandom is used instead.

Releases

No releases published

Packages

No packages published

Languages