EE340-Project1-Vision-Transformer

Introduction

SUSTech EE340 Project 1: Mnist classification.

I implement a simple Vision Transformer (ViT) model with Pytorch.

This model was proposed by Dosovitskiy et al. in the paper "An image is worth 16x16 words: Transformers for image recognition at scale" (2020).

python main.py

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/MNIST/raw		data/MNIST/raw
.gitignore		.gitignore
README.md		README.md
classification_report_test.csv		classification_report_test.csv
confusion_matrix_test.png		confusion_matrix_test.png
confusion_matrix_train.png		confusion_matrix_train.png
loss_figure.png		loss_figure.png
main.py		main.py
trainer.py		trainer.py
transformer.py		transformer.py
util.py		util.py
vision_transformer.pth		vision_transformer.pth