Skip to content

This is an implementation of a simple Vision Transformer (ViT) for a dummy classification task of predicting whether a person is wearing a hat or not. πŸŽ©πŸ€ πŸŽ“

Notifications You must be signed in to change notification settings

logic-OT/ViT-FROM-SCRATCH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Simple Vision Transformer Implementation With Pytorch

This Repo implements a simple Vision Transformer(ViT) for a dummy classification task of predicting whether a person is wearing a Hat or not.

Key Components

  1. Image patcher and depatcher
  2. Positional Encoding and Transformer Encoder (My implementation is from "Attention is All You Need")
  3. Model Architecture
  4. Attention Visualisation

Architecture design

Repo Content

This Repo contains 2 files: transformer_utils.py and ViT Experiments.py

  1. transformer_utils.py : Contains key components of the transformer encoder. In this file, you would find:

    • A Single-head self attention layer implementation
    • A Multi-head self attention layer
    • A Positional encoder
    • Transformer Encoder Which you can download and import to quickly build your own architecture
  2. ViT Experiments.py: The notebook where I trained my transformer model to classify Hat or No hat images. All preprocessing and visualisations are done here.

  3. OBSERVATIONS.txt: These are a summary of observations corrections I made which i think would help with better undersatnding

Data

This is a small dataset of 471 images. Link to the data is: here

Encoder Design

encoder1 design

Performance

The emphasis if this Repo was more on model Architecture rather than performance. That being said, the result were good

  • EPOCHS: 100
  • TRAIN: 0.93
  • TEST: 0.91

Attention Visualisation Examples

download (15)

download (18)

download (17)

download (16)

About

This is an implementation of a simple Vision Transformer (ViT) for a dummy classification task of predicting whether a person is wearing a hat or not. πŸŽ©πŸ€ πŸŽ“

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published