This is the code for Cornell Birdcall Identification challenge hosted on Kaggle
Librosa library is pretty slow for reading and transforming audio. So, I read data using librosa and saved it as HDF5 file. More about that you can read here.
Script for transforming .mp3
to hdf5: create/read_and_transform_audio.py
Augmentations are useful for better models generalization. I've used albumentations library and this Kaggle notebook to build augmentations for spectrograms transforming.
Code for this part tou can find here: modules/data/augmentations
I've used CNN for image classification for this task. Family of EfficientNet models is the SOTA for image classification now, so I chose it. Also I've used PyTorch Lightning to build training pipeline.
Model part: modules/model
- PyTorch - Neural networks framework used
- PyTorch Lightning - For training pipeline
- Albumentations - Fot spectrogram augmentaions