Exploring-AAD: Audio Anomaly Detection in Machine Sounds using Chroma and Mel Frequency Feature: CENS, Log Mel Frequncy and MFCC.

This project focuses on exploring and comparing various methods for unsupervised audio anomaly detection in machine operation noise, using the MIMII Dataset. Audio anomaly detection is useful in ensuring equipment safety and identifying potential faults.

This README provides an overview of the project, including its objectives, feature space, machine learning models, and key results.

The methods employed in this project draw inspiration from my Master's Thesis in Mathematical Statistics, where I researched audio anomaly detection in cars. The motivation for utilizing chromagram-based features stems from interactions with professionals in the Noise and Vibration Harshness (NVH) during my thesis research. I noticed that, when characterizing abnormal noises in vehicles, professionals often resorted to descriptors like "humming" or "clicking.". These descriptions led me to explore musical analysis and classification as a viable avenue for audio anomaly detection. The chromagram feature was thus integrated into the project's methodology and gave promising results.

Approach

Chroma and CENS

Chroma features, as inspired by insights from musical analysis (for more info: see intro or longer intro, have been employed in this project as a way of capturing distinct information within audio data. These features are based on the twelve pitch spelling attributes (C, C♯, D, ..., B) used in Western music notation. They measure the energy in an audio signal's frame is distributed across these twelve chroma bands.

To obtain chroma energy normalized statistics (CENS), a smoothing window of length ℓ is applied, similar to a Hann window, calculating local weighted averages for each of the twelve chroma components. This process results in sequences of 12-dimensional vectors with nonnegative entries. Subsequently, this sequence is downsampled by a factor of d, and the resulting vectors are normalized with respect to the Euclidean norm (ℓ2-norm). For instance, consider a rate of 10Hz for a original chroma sequence. With ℓ=41, corresponding to a window size of 4100 milliseconds, and downsampling parameter d=10, the feature rate reduces to 1Hz. The resulting CENS sequences will how lower dimiensions while still retaining important information

Models

To asses how well chroma and CENS features can detect anomalies in machine sounds, this project will compare CENS features preformance with the more commmanly used Mel spectrogram. Using Unsupervised learning models: Autoencoders (AE), Isolation Forest (IF) and Local outlier detector (LOF). And the abnormal and normal machine sound clips of fans and valves from the MIMII Dataset. Below are short intros to the models.

Autoencoders:
- A type of neural network architecture.
- Effective for handling high-dimensional audio features.
- Works by reconstructing input data while capturing abnormal patterns.
- Utilizes reconstruction error as a measure of deviation from the norm.
Isolation Forest (IF):
- An unsupervised ensemble learning algorithm.
- Assumes that anomalous data points are rare and different.
- Divides data into subspaces to isolate anomalies.
- Provides an anomaly score based on the number of splits needed to isolate a data point.
Local Outlier Factor (LoF):
- Detects anomalies by measuring local density deviations.
- Compares data points to their neighbors.
- Not commonly used for audio data but has shown strong performance on high-dimensional spectral data, as demonstrated by Yu et al. in this study.

Results

Below are som key reults from using Mel frequeny and chroma features on the fan and valve datasets: which show interestig insights into the performance of various models and feature combinations for machine sound anomaly detection:

Mean Features: Across all models, using mean features performed exceptionally well, with LOF utilizing all features means and just mel feature means achieving near-perfect classification. This demonstrates the effectiveness of mean-based statistics in capturing abnormal patterns.
Autoencoder (AE) Improvement with Chroma: In the case of fan machine sound anomaly detection, it's interesting to observe a substantial improvement in the F1 score of the AE model when using chroma features compared to mel spectrogram features. This improvement may be due to the chromagrams of abnormal and normal sound clips looking more diffent then the mel spectrogram. This bar plot of the means, show significant differences in mean pitch between the classes for G and G#. To see the encoding and decoding of spectrograms themselves: Check out the notebooks!
Valve Machine Sound Anomaly Detection: The valve machine sound detection results display comparatively lower metrics, indicating the complexity of anomaly detection task, as the feature space + model that may work great in one context does not necessarily translate to all sound anomoly detection tasks. LOF with various feature combinations showcases better performance, but the overall metrics for valve machine sound detection remain modest.
LOF Dominance (!!): Across both machine types and feature combinations, the LOF model consistently outperforms other models in terms of F1 scores, with LOF using chroma-means for fan sound achieving an abnormal F1 score of 0.851983 aswell as a AUc score near very close to 1. This highlights the usefullness of LOF for audio anomaly detection.

In summary, these results show the usefullness of mean-based features, the advantage of chroma features for AE models in specific scenarios, and the dominance of LOF as a robust choice for audio anomaly detection, achieving near-perfect classification even with mel features alone. While the project aimed to explore the utility of chroma in machine anomaly detection, it suggests that chroma features hold substantial promise when coupled with certain models like AE, while LOF consistently delivers excellent results.

Fan machine sound anomaly detection - All models and feature combos

Sorted by best F1 score of the abnormal class -1

Model	Machine	Features	AUC	Precision	Recall	Abnormal (-1) F1
LOF	fan	All-means	0.99597	0.946133	0.933876	0.889246
LOF	fan	mel-means	0.994428	0.94199	0.927887	0.880223
LOF	fan	chroma-means	0.982158	0.92743	0.909679	0.851983
Isolation Forest	fan	All-means	0.945388	0.891961	0.85769	0.779182
LOF	fan	mel	0.947371	0.885866	0.83517	0.756201
AE	fan	chroma	0.823611	0.916898	0.861765	0.725539
AE	fan	mel	0.696196	0.831592	0.884314	0.556539

Valve machine sound anomaly detection - All models and feature combos

Sorted by best F1 score of the abnormal class -1

Model	Machine	Features	AUC	Precision	Recall	Abnormal (-1) F1
LOF	valve	All-means	0.665219	0.859019	0.414323	0.258385
LOF	valve	mel-means	0.666815	0.859777	0.402551	0.25641
LOF + PCA	valve	All-means	0.641049	0.851031	0.362982	0.243201
LOF + PCA	valve	mel-means	0.618747	0.839748	0.339111	0.233017
LOF	valve	chroma-means	0.575024	0.83062	0.292675	0.223339
AE	valve	chroma	0.497541	0.882027	0.883661	0.112202
AE	valve	mel	0.494097	0.881232	0.8907	0.101597

Mean CENS plot

Usage

Clone this repository to your local machine:

git clone https://github.com/AHruler/Exploring-AAD.git
cd Exploring-AAD

Download the Fan and Vlave machine sound datasets at MIMII Dataset, add them to a ./data map.
Run the Jupyter notebooks provided in the notebooks directory to explore the methods and reproduce the results.
Experiment with different configurations, models, and datasets to further explore audio anomaly detection techniques.

Package Requirements

To run the code and reproduce the results, ensure you have the following Python packages installed:

numpy
pandas
matplotlib
seaborn
scikit-learn
tensorflow
torchaudio (for audio processing and audio feature extraction)
librosa (for audio feature extraction)
tqdm
tabulate (for printing tables)

You can install these packages using pip:

pip install numpy pandas matplotlib seaborn scikit-learn tensorflow torchaudio librosa tabulate

Using conda you can replicate the environment in environment.yml:

conda env create -n ENVNAME --file environment.yml

License

This project is licensed under the MIT License.

Acknowledgments

The MIMII dataset for providing valuable audio data for experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
figs		figs
models		models
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring-AAD: Audio Anomaly Detection in Machine Sounds using Chroma and Mel Frequency Feature: CENS, Log Mel Frequncy and MFCC.

Contents

Inspired by Previous (thesis) Project

Approach

Chroma and CENS

Models

Results

Fan machine sound anomaly detection - All models and feature combos

Valve machine sound anomaly detection - All models and feature combos

Usage

Package Requirements

License

Acknowledgments

About

Releases

Languages

License

AHruler/Exploring-AAD

Folders and files

Latest commit

History

Repository files navigation

Exploring-AAD: Audio Anomaly Detection in Machine Sounds using Chroma and Mel Frequency Feature: CENS, Log Mel Frequncy and MFCC.

Contents

Inspired by Previous (thesis) Project

Approach

Chroma and CENS

Models

Results

Fan machine sound anomaly detection - All models and feature combos

Valve machine sound anomaly detection - All models and feature combos

Usage

Package Requirements

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages