Project for Advanced Audio Processing Course at Tampere University. In this project, we explored the effectiveness of multi-task learning for audio classification tasks. Our model was designed using a hard parameter sharing architecture, sharing all hidden layers but keeping task-specific output layers separate. We compared our multi-task model with two individual models trained separately for gender and digit classification. Results showed that our proposed model comparably to the individual single task models, as shown in the table below.
Model | Accuracy | Precision | Recall | |
---|---|---|---|---|
Single-Task | Gender Classification | 97.847% ±1.485% | 0.987 ±0.014 | 0.986 ±0.016 |
Digit Classification | 98.671% ± 0.862% | 0.987 ±0.009 | 0.987 ±0.009 | |
Multi-Task | Gender Classification | 95.84% ± 2.898% | 0.978 ±0.025 | 0.97 ±0.025 |
Digit Classification | 96.766% ± 1.805% | 0.968 ±0.018 | 0.968 ±0.018 |