This repository contains supplementary material to my Master's thesis - Fine-grained Visual Recognition with Side Information.
The thesis presents a method for fine-grained visual snake and fungi species recognition with side information. The proposed method is based on state-of-the-art deep neural networks for classification: Convolutional Neural Networks and Vision Transformers. The performance improvements are achieved by:
- adopting loss functions proposed to address the class imbalance;
- adjusting predictions by prior probabilities of side information like location and time of observation;
- applying a weakly supervised method to localize snakes and fungi in images and crop the images based on the detected regions to enrich the training data.
- SnakeCLEF dataset
- Danish Fungi dataset - DF20 and DF20M
- Training and testing on the snake species recognition task:
- Training script
- Testing script
- Training Notebook
- Testing Notebook
- Training script on cropped images created using saliency-based localization method
- Training and testing on the fungi species recognition task:
- Training script
- Testing script
- Training Notebook
- Testing Notebook
- Training script on cropped images created using saliency-based localization method
- Data Preparation - notebooks for preparation, exploration, and cleaning of the SnakeCLEF and Danish Fungi datasets.
- Side Information - notebooks for metadata inclusion. On the SnakeCLEF dataset, the method drops the predictions of the species not occurring in the country of the given image. For fungi species recognition, the method calibrates and adjusts the predictions by the prior probabilities of side information like habitat, substrate, location, and time of observation.
- Informed Augmentation - notebooks for applying a weakly supervised saliency-based method to localize snakes and fungi in images.
- Venomous/Non-venomous Snake Classification - example of using the proposed method to decide on medical response to snake bites.
- Training Results
The snake and fungi datasets, used in this thesis, are publicly available at:
The proposed method wes developed using Python=3.8
with PyTorch=1.7.1
machine learning framework.
The pre-trained CNN networks were used from PyTorch Image Models library timm=0.4.12
,
and the pre-trained Vision Transformers were used from Hugging Face Trasformers library transformers=4.12.3
.
Additionally, the repository requires packages:
numpy
, pandas
, scikit-learn
, matplotlib
and seaborn
.
To install required packages with PyTorch for CPU run:
pip install -r requirements.txt
For PyTorch with GPU run:
pip install -r requirements_gpu.txt
The requirement files do not contain jupyterlab
nor any other IDE.
To install jupyterlab
run
pip install jupyterlab
Rail Chamidullin - chamidullinr@gmail.com - Github account