This is a demo for FullSubNet Speech Enhancement for Vietnamese ASR. The Speech Enhancement model was trained on 1k3 hours of speech data with dynamic mixing (mix-on-the-fly).
Here, we found an approach to remedy the problem of SE when adapting as front-end to ASR (SE causes the degradation of ASR decoding performance on clean speech). The idea is simple but can work with any SE and ASR models.
- First, download the model checkpoints from this link
- Move the downloaded folder to the root directory and rename it as "checkpoints"
- Set up environment with either Docker or Manually
Docker build
docker build -t demo .
Docker run
Docker run demo
Install packages
pip install -r requirements.txt
Run Flask app
python app.py -c config.json
Check example.py file for API usage