GitHub - dmytroyelchaninov/is_hot_dog: Hot dog classifier, uses fine tuned ViT model. Model's size too large for GitHub

The script fine-tunes a Vision Transformer (ViT) to spot if a hot dog is in the image, hitting a 97% accuracy, while CNNs like VGG and MobileNet max out around 85% in the same time. ViT rocks because it looks at the whole image at once, catching details CNNs miss with their narrow focus on local areas. The key idea is to treat images like sequences, using transformer attention to pull out global features more effectively. That extra awareness is why ViT crushes it for tasks like this!

For training, I used Google Colab as it provides access to CUDA cores, which are essential for speeding up the fine-tuning process of Vision Transformer (ViT). ViT models require significant computational power due to their self-attention mechanism, and leveraging GPU resources ensures faster and more efficient training.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
hot_dog.ipynb		hot_dog.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

dmytroyelchaninov/is_hot_dog

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages