This repository is to introduce some applications for class-based Activation Maximization(AM) in audio domain, which was published at American Journal of Computer and Technology.
Neural networks are predominant for various tasks including object detection, speech recognition, emotion detection, and so on. However, its process is, in general, not understandable for human beings. To understand how the models tackle the problems, some visualization techniques are invented such as feature visualizations. In this repository, I'm going to share the applications of Activation Maximization(AM) which is one of the feature visualization tactics.
Basically, in AM, the input data is optimized to the data that activates the selected neuron. It contains the filter of layers, the classification output, and so on. In our case, the output of the classifier is optimized to observe the result of being a certain class. That's why I called it class-based Activation Maximization, and this is mentioned in this paper. For further information, please visit this excellent explanation for AM
In this experiment, I'm going to optimize the noise of GAN which is employed as a prior as shown below. As for the form of audio data, 2 types of audio features are employed, which are raw audio and mel-spectrogram. We're going to observe the differences between the data form and the structure of the models. What's more, Conditional GAN is also experimented to figure out the importance of being a certain emotion. Lastly, the biggest advantage of this idea is that it can be used as an enhancer of the model output. For example, in our case, the model was not able to generate audio which we expected, but this concept allowed the model to enhance its output specific to our purpose.
This idea requires 2 models, including a classifier and a generator for GAN (or Conditional GAN). Some brief definition of the notebooks are as follows:
01_audio_emotion_classifier.ipynb
: emotion classification in audio domain02_GAN_training.ipynb
: Training of GAN03_GAN_audio_AM
: Activation Maximization in raw audio with GAN04_mel_emotion_classifier.ipynb
: emotion classification in mel-spectrogram05_GAN_mel_AM
: Activation Maximization in mel_spectrogram with GAN06_result_GAN_AM
: Summary of the Activation Maximization in GANA_preprocessing_TESS_and_RAVDESS
: Brief introduction and Preprocessing of DatasetsA-download_Download_TESS_RAVDESS
: How to download TESS and RAVDESS datasetsB_WaveGlow_parameters
: Obtaining the parameters of WaveGlowC_Emotion_Recognition-Inception
: emotion classification with Inception Model
Since I'm not allowed to post any audio data in README, I've posted the audio on my blog.
Please visit GAN/notebook/06_result_GAN_AM.ipynb
or GAN/notebook/06-A_result_cGAN_AM.ipynb
for additional results and discussions.
neutral
sad
angry
happy
- AM while fixing the text information.
- employ a model which is capable of adding emotion to audio, and use it as a prior.
In this repository, we share the environment that you can run the notebooks.
- Build the docker environment.
- with GPU
docker build --no-cache -f Docker/Dockerfile.gpu .
- without GPU
docker build --no-cache -f Docker/Dockerfile.cpu .
- with GPU
- Check the <IMAGE ID> of the created image.
docker images
- Run the docker environment
- with GPU
docker run --rm --gpus all -it -p 8080:8080 -e LOCAL_UID=$(id -u $USER) -e LOCAL_GID=$(id -g $USER) -v ~/:/work <IMAGE ID> bash
- without GPU
docker run --rm -it -p 8080:8080 -e LOCAL_UID=$(id -u $USER) -e LOCAL_GID=$(id -g $USER) -v ~/:/work <IMAGE ID> bash
- with GPU
- Run the jupyter lab
nohup jupyter lab --ip=0.0.0.0 --no-browser --allow-root --port 8080 --NotebookApp.token='' > nohup.out &
- Open the jupyter lab
- Put http://localhost:8080/lab? to web browser.
Git LFS (large file storage)
Since this repository contains the parameters of the models. I used Git LFS to store a large file. The codes below are the recipe for this.
brew update
brew install git-lfs
- then, navigate to this repository.
git lfs install
git lfs fetch --all
git lfs pull
Some are not explained which include:
- explanations of some functions and models.
Feel free to contact me if you have any questions (s-inoue-tgz@eagle.sophia.ac.jp).