This repository is the official implementation of DENet: a deep architecture for audio surveillance applications.
For more information you can contact the authors at: agreco@unisa.it, aroberto@unisa.it, asaggese@unisa.it, mvento@unisa.it .
If you use this code in your research, please cite this paper.
@article{greco2021denet,
title={DENet: a deep architecture for audio surveillance applications},
author={Greco, Antonio and Roberto, Antonio and Saggese, Alessia and Vento, Mario},
journal={Neural Computing and Applications},
doi={10.1007/s00521-020-05572-5},
pages={1--12},
year={2021},
publisher={Springer}
}
DENet is a novel Recurrent Convolutional Neural Network architecture for audio surveillance applications. It is based on a new layer that we call Denoising-Enhancement (DE) Layer, which performs denoising and enhancement of the original signal by applying an attention map on the components of the band-filtered signal. Differently from state of the art methodologies, DENet takes as input the lossless raw waveform and is able to automatically learn the evolution of the frequencies-of-interest over time, by combining the proposed layer with a Bidirectional Gated Recurrent Unit. Using the feedbacks coming from classifications related to consecutive frames (i.e. that belong to the same event), the proposed method is able to drastically reduce the misclassifications.
- tensorflow-gpu==1.13.1
- keras==2.2.4
- numpy==1.19.1
To install the requirements:
git clone https://github.com/MiviaLab/DENet.git
cd DENet
pip install -r requirements.txt
get_denet(input_shape, n_classes, sr=16000, before_pooling=True, dropout=0.3)
- input_shape: tuple in the form (seq_len, samples, 1)
- n_classes: number of dense units in the last layer
- sr: input sampling rate
- before_pooling: set it to False to put the DELayer after the MaxPooling and the Activation Layers
- dropout: dropout probability for all the Dropout layers in the network
import numpy as np
from denet import get_denet
# Settings
batch_size = 100
seq_len = 10 # number of frames in the sequence
samples = 400 # frame_size * sample_rate
input_shape = (seq_len, samples, 1)
sample_rate = 16000
n_classes = 10
# Get the model
model = get_denet(input_shape, n_classes, sr=sample_rate, before_pooling=False)
# Print the model
model.summary()
# Predict random data
X = np.random.rand(batch_size, seq_len, samples, 1)
y = model.predict(X)
print(y.shape)
Our model achieves the following performance on :
- RR: Recognition Rate (Recall)
- MR: Miss Rate
- ER: Error Rate
- FPR: False Positive Rate
Method | RR | MR | ER | FPR |
---|---|---|---|---|
DENet | 0.975 | 0.014 | 0.011 | 0.029 |
SincNet | 0.971 | 0.019 | 0.010 | 0.029 |
COPE | 0.960 | 0.031 | 0.009 | 0.043 |
SoundNet | 0.933 | 0.007 | 0.060 | 0.223 |
Method | RR | MR | ER | FPR |
---|---|---|---|---|
DENet (Fine-Tuning) | 0.998 | 0.002 | 0.000 | 0.043 |
MobileNet (Fine-Tuning) | 0.995 | 0.000 | 0.005 | 0.037 |
DENet | 0.975 | 0.025 | 0.000 | 0.021 |
MobileNet | 0.965 | 0.010 | 0.028 | 0.067 |
COPE | 0.940 | 0.048 | 0.012 | 0.067 |
SincNet | 0.773 | 0.200 | 0.027 | 0.010 |
The code and mode are available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License(https://creativecommons.org/licenses/by/4.0/).
Downloading this code implies agreement to follow the same conditions for any modification
and/or re-distribution of the dataset in any form.
Additionally any entity using this code agrees to the following conditions:
THIS CODE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Please cite the paper if you make use of the dataset and/or code.