Skip to content

Repo for buzz-based honey bee ML models in task of anomaly detection.

Notifications You must be signed in to change notification settings

tymons/buzz-based-anomaly

Repository files navigation

Buzz-Based Anomaly

unittest

This repo contains scripts and utils for buzz-based bee anomaly detection models. We utilize different methods for e.g. swarming, pest attact, queenless detection tasks. Models are tested against anomalies which are (for now) just different colonies sounds. The NU-HIVE project data are labeled as anomalies for ML models. This reflects real world scenario as the data comes from different devices and bees subspecies.

Data preprocess

In order to build working dataset one should use data_prepare.py script along with specified option. There are three different options to perform data processing:

  • NU-HIVE BEES' DATA EXTRACTION OPTION

This step extracts only bees' sound data from nuhive dataset. New WAV sound files are saved within created nu-hive-processed/ folder. To execute such option run:

$ data-prepare.py extract-nuhive-bees --data_folder C:\NUHIVE_FOLDER
  • FRAGMENT SOUNDS

In order to use some fancy methods with train.py script one should have coherent dataset. To fragment all wav sounds to files with equal length fragment-sound-bees option should be used. Additional options are --duration which is length in seconds for every fragment. Example:

$ data-prepare.py fragment-sound --data_folder C:\NUHIVE_FOLDER --duration 2
  • UPSAMPLED SOUNDS

Machine Learning models require unified input where e.g. spectrogram should exhibit unified shape for the full dataset. In order to handle such scenario one could use upsample-sound option. Such utility upsample all audio files within a folder to given frequency. Audio data which already satisfy sampling rate requirement will be preserved. The rest of data will be upsampled and saved as new file where parent audio will be deleted.

$ data-prepare.py upsample-sound --data_folder C:\NUHIVE_FOLDER --sampling_rate 44100
  • DOWNLOAD SMARTULA DATA

One could download data from smartula server to train own models. Mind that this requires SMARTULA_API and SMARTULA_TOKEN environments to be set. Smartula raw data will be preprocessed in order to reject samples which are too silent or distorted sounds.

$ python data_prepare.py --start YYYY-MM-DD --end YYYY-MM-DD --smartula_hives DEADBEEF99

Model train

Model training entrypoint is based on train.py script. Currently, only listed models and sound features are supported. Names in italics are direct arguments to the train.py script.

Vanilla Autoencoders

  • ✅ Autoencoder (autoencoder)
    • periodogram (periodogram)
  • ✅ Convolutional 1D Autoencoder (conv1d_autoencoder)
    • Periodogram (periodogram)
  • ✅ Convolutional 2D Autoencoder (conv2d_autoencoder)
    • Spectrogram (spectrogram)
    • MelSpectrogram (melspectrogram)
    • MFCC (mfcc)

Variational Autoencoders

  • ✅ Variational Autoencoder (vae)
    • periodogram (periodogram)
  • ✅ Convolutional 1D Variational Autoencoder (conv1d_vae)
    • periodogram (periodogram)
  • ✅ Convolutional 2D Variational Autoencoder (conv2d_vae)
    • Spectrogram (spectrogram)
    • MelSpectrogram (melspectrogram)
    • MFCC (mfcc)

Contrastive Autoencoders

Mind that for contrastive learning --contrastive_data_folder argument should be passed. All data from contrastive data folder will be transformed to feature passed as 1st argument of train script. Contrastive data will be shuffled and truncated to the length of original dataset.

For now, contrastive data should originate from the same source and has same parameters (eg. sampling frequency, sound length) as target data.

  • ✅ Contrastive Autoencoder (contrastive_autoencoder)
    • Periodogram (periodogram)
  • ✅ Contrastive Convolutional 1D Autoencoder (contrastive_conv1d_autoencoder)
    • Periodogram (periodogram)
  • ✅ Contrastive Convolutional 2D Autoencoder (contrastive_conv2d_autoencoder)
    • Spectrogram (spectrogram)
    • MelSpectrogram (melspectrogram)
    • MFCC (mfcc)
  • ✅ Contrastive Variational Autoencoder (contrastive_vae)
    • Periodogram (periodogram)
  • ✅ Contrastive Convolutional Variational 1D Autoencoder (contrastive_conv1d_vae)
    • Periodogram (periodogram)
  • ✅ Contrastive Convolutional Variational 2D Autoencoder (contrastive_conv2d_vae)
    • Spectrogram (spectrogram)
    • MelSpectrogram (melspectrogram)
    • MFCC (mfcc)

Docker support

Repo has ready to use docker images at dockerhub/tymonzz with entrypoint set to train.py script. Example script:

$ docker run -d --name buzz-based-anomaly \
              -v /ssd_local/142847ct/buzz-based-anomaly-dataset/smartula:/data \
              -v /home/macierz/142847ct/142847ct/research/buzz-based-anomaly:/io \
              --gpus all \
              -t tymonzz/buzz-based-anomaly:cuda-11.1 \
              conv1d_autoencoder periodogram /data \
              --filter_hives DEADBEEF93 DEADBEEF94 DEADBEEF95 \
              --log_folder /io/output/logs \
              --model_output /io/output/models \
              --model_config /io/input/model_config.yml \
              --feature_config /io/input/feature_config.yml \
              --learning_config /io/input/learning_config.yml \
              --comet_config /io/input/comet.config \
              --find_best 4

Note that if you are using git-for-windows or other shell emulators MSYS_NO_PATHCPMV=1 environment variable should be set in order to correctly parse paths for data and output volumes.

Building own docker image

To build cuda-capable docker image simply use build_docker.sh script. By default, your host cuda version will be parsed and image from nvidia/cuda tagged with same version will be downloaded. Similarly, PyTorch Cuda Toolkit should be downloaded with version matching your cuda installation. Note that your host cuda version has format MAJOR.MINOR - probably there is no nvidia docker image/anaconda pytorch cuda toolkit which match that format..

To overwrite default cuda versions just use options: -p for PyTorch and -d for nvidia cuda. For example, pytorch cuda toolkit v10.2 and nvidia/cuda_ docker image with v11.2.0 could be overwritten with:

$ ./build_docker.sh -p 10.2 -d 11.2.0

About

Repo for buzz-based honey bee ML models in task of anomaly detection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages