Simple Vision Transformer Implementation With Pytorch

This Repo implements a simple Vision Transformer(ViT) for a dummy classification task of predicting whether a person is wearing a Hat or not.

Key Components

Image patcher and depatcher
Positional Encoding and Transformer Encoder (My implementation is from "Attention is All You Need")
Model Architecture
Attention Visualisation

Repo Content

This Repo contains 2 files: transformer_utils.py and ViT Experiments.py

transformer_utils.py : Contains key components of the transformer encoder. In this file, you would find:
- A Single-head self attention layer implementation
- A Multi-head self attention layer
- A Positional encoder
- Transformer Encoder Which you can download and import to quickly build your own architecture
ViT Experiments.py: The notebook where I trained my transformer model to classify Hat or No hat images. All preprocessing and visualisations are done here.
OBSERVATIONS.txt: These are a summary of observations corrections I made which i think would help with better undersatnding

Data

This is a small dataset of 471 images. Link to the data is: here

Encoder Design

Performance

The emphasis if this Repo was more on model Architecture rather than performance. That being said, the result were good

EPOCHS: 100
TRAIN: 0.93
TEST: 0.91

Attention Visualisation Examples

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
ViT Experiments.ipynb		ViT Experiments.ipynb
transformer_utils.py		transformer_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Vision Transformer Implementation With Pytorch

Key Components

Repo Content

Data

Encoder Design

Performance

Attention Visualisation Examples

About

Releases

Packages

Languages

logic-OT/ViT-FROM-SCRATCH

Folders and files

Latest commit

History

Repository files navigation

Simple Vision Transformer Implementation With Pytorch

Key Components

Repo Content

Data

Encoder Design

Performance

Attention Visualisation Examples

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages