Skip to content

squarezhong/EE340-Project1-Vision-Transformer

Repository files navigation

EE340-Project1-Vision-Transformer

Introduction

SUSTech EE340 Project 1: Mnist classification.

I implement a simple Vision Transformer (ViT) model with Pytorch.

This model was proposed by Dosovitskiy et al. in the paper "An image is worth 16x16 words: Transformers for image recognition at scale" (2020).

Usage

python main.py

References

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

About

SUSTech EE340 (Statistical Learning for Data Science) Project1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages