Skip to content

6th place solution to Freesound Audio Tagging 2019 kaggle competition

License

Notifications You must be signed in to change notification settings

mnpinto/audiotagging2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

6th place solution for Freesound Audio Tagging 2019 Competition

Description of the solution: https://link.medium.com/Kv5kyHjcIX

How to use

  • Install fastai and librosa:
conda install -c pytorch -c fastai fastai
conda install librosa
  • Clone the repository:
git clone https://github.com/mnpinto/audiotagging2019.git
python run.py --n_epochs 1 --max_processors 8

If successful the script will create train_curated_png and train_noisy_png folders with the Mel spectrograms corresponding to all audio clips and train the model for 1 epochs using the default arguments. The max_processors argument will set how many processors to use to this preprocessing step. After the training is complete a folder models will be created and a weights file stage-1.pth will be saved their. Finally a submission file will be generated with the default name submission.csv.

If you find any errors let me know by creating an Issue, the code has not yet been tested on fastai versions after 1.0.51.

Arguments

name type default description
--path str data path to data folder
--working_path str . path to working folder where model weights and outputs will be saved
--base_dim int 128 size to crop the images on the horizontal axis before rescaling with SZ
--SZ int 128 images will be rescaled to SZxSZ
--BS int 64 batch size
--lr float 0.01 maximum learning rate for one_cycle_learning
--n_epochs int 80 number of epochs to train the model
--epoch_size int 1000 number of episodes (with batch size BS each) in each epoch
--f2cl int 1 train only on samples with F2 score (with threshold of 0.2) less than f2cl
--fold_number int 0 KFold cross-validation fold number: (0,1,2,3,4) or -1 to train with all data
--loss_name str BCELoss loss function to use, options are BCELoss and FocalLoss
--csv_name str submission.csv name of csv file to save with test predictions
--model str models.xresnet18 can be a fastai model as the default or xresnet{18,34,50}ssa to use simple self-attention
--weights_file str stage-1 name of file to save the weights
--load_weights str provide the name of weights file (e.g., stage-1) to load before training
--max_processors int 8 number cpu threads to use for converting wav files to png
--force bool False if set to True the pngs will be recomputed for noisy and curated train datasets

Replicating my top scoring solution

Important! This code has not yet been tested. I ran all experiments on Kaggle kernels and refactored the code to create this repository. After the final results of the competition are available, late submissions will be allowed so I will then test the code to check if anything is missing.

My top scoring solution with a score of 0.742 on public LB and 0.75421 on private LB (final results pending...) is the average of the following 6 runs:

python run.py --model xresnet18ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 80 --loss_name FocalLoss --weights_file model1 --csv_name submission1.csv
              
python run.py --model xresnet34ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 60 --loss_name FocalLoss --weights_file model2 --csv_name submission2.csv
              
python run.py --model xresnet18ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 80 --weights_file model3 --csv_name submission3.csv
              
python run.py --model xresnet34ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 60 --weights_file model4 --csv_name submission4.csv

python run.py --model models.xresnet34 --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 90 --loss_name FocalLoss --weights_file model5 --csv_name submission5.csv
              
python run.py --model models.xresnet50 --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 65 --weights_file model6_0
              
python run.py --model models.xresnet50 --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 65 --load_weights model6_0 --weights_file model6 --csv_name submission6.csv             

The penultimate run, generating model6_0 weights is not used for the ensemble, is just to generate the weights that are used to the last identical run. If you are running locally, try a single run with more epochs, the 2x65 epochs is just to accommodate for the 9h run-time limit of Kaggle kernels.

Ablation study (in progress)*

  • Fixed parameters: --base_dim 128 --SZ 256 --fold_number -1 --n_epochs 80 --loss_name FocalLoss
Model private LB scores
xresnet18ssa [0.74211, 0.74695]
xresnet34ssa [0.74545, 0.74875]
xresnet50 [0.73966, 0.74062]
  • Fixed parameters: --base_dim 128 --SZ 256 --fold_number -1 --n_epochs 80 --loss_name BCELoss
Model private LB scores
xresnet18ssa [0.73824, 0.73956]
xresnet34ssa [0.74378, 0.74567]

Citing this repository

@misc{mnpinto2019audio,
  author = {Pinto, M. M.},
  title = {6th place solution for Freesound Audio Tagging 2019 Competition},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/mnpinto/audiotagging2019}}
}

Releases

No releases published

Packages

No packages published

Languages