Skip to content

[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation

License

Notifications You must be signed in to change notification settings

MingLunHan/CIF-HieraDist

Repository files navigation

CIF-HieraDist

Introduction

[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

🚀🚀 This repository is the official implementation for the hierarchical knowledge distillation (HieraDist) developed for the continuous integrate-and-fire (CIF) based ASR models.

We propose the hierarchical knowledge distillation (HKD or HieraDist) to transfer the knowledge from the pre-trained language models (PLMs) to the ASR models. HieraDist employs cross-modal knowledge distillation with token-level contrastive loss at the acoustic level and knowledge distillation with regression loss at the linguistic level.

Alt Text

Please refer to the original paper for more details: Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation.

What can you do with this repository?

  1. Train a CIF-based ASR model;
  2. Train a CIF-based ASR model with acoustic contrastive distillation (ACD);
  3. Train a CIF-based ASR model with linguistic regression distillation (LRD);
  4. Train a CIF-based ASR model with hierarchical knowledge distillation (HieraDist/HKD).
  5. Conduct model inference.

Usage

Installation

My default python version:

python==3.7.9

You should install all dependecies with the following commands:

cd CIF-HieraDist
pip install -r requirements.txt
pip install -e ./

Let's take the AISHELL-1 dataset as an example and navigate to the corresponding working directory for this dataset:

cd egs/aishell1

Data Preparation

The development of this repository is based on the Fairseq. Please refer to the original data preparation of speech-to-text in Fairseq. You can also refer to the https://github.com/MingLunHan/CIF-HieraDist/blob/main/examples/speech_to_text/prep_aishell1_data.py and modify it for your datasets.

python ../../examples/speech_to_text/prep_aishell1_data.py --input-root ${YOUR_PATH_TO_AISHELL1} --output-root ./data/

Note that YOUR_PATH_TO_AISHELL1 is the parent directory of the AISHELL-1 dataset.

Model Training

To train a standard CIF-based ASR model, you should use the command:

bash run_train_aishell1_cif_small_exp35_14.sh

To train a CIF-based ASR model with HieraDist/HKD, you should first extract features from PLM with the following command:

bash run_extract_plm_feats.sh

The output json file of PLM features should be set in the configuration file in egs/aishell1/data. Then, you should use the command:

bash run_train_bert_distilled_cif_exp4_decdistill0p01_noscale_finalstate_contrastiveloss1p0_conttemp0p02_rmvrpt_neg700.sh

We provide the original training logs in egs/aishell1 for comparison.

Model Inference

To conduct the inference for an ASR model, you should use the command:

bash run_infer.sh

We provide the original inference logs in egs/aishell1 for comparison.

Key Results

When not using any extra language models, we can get the results in the following table:

Methods dev (CER %) test (CER %)
CIF 4.5 4.9
CIF + ACD 4.2 4.7
CIF + LRD 4.0 4.5
CIF + HieraDist 3.8 4.2 (4.1 with better decoding hyper-parameters in later experiments)

With the language model trained with the text of AISHELL-1 itself, we can get:

Methods dev (CER %) test (CER %)
CIF 4.4 4.8
CIF + ACD 4.2 4.6
CIF + LRD 4.0 4.4
CIF + HieraDist 3.8 4.1

Acknowledgments

This repository is developed on Fairseq. Thanks to the Facebook Artificial Intelligence Research (FAIR) for releasing the Fairseq framework.

Other Resources

Citation

If you are inspired by this paper, or use the core codes from this repository for your development, or conduct research related to it, please cite this paper with the following bibtex format:

@inproceedings{han23_interspeech,
  author={Minglun Han and Feilong Chen and Jing Shi and Shuang Xu and Bo Xu},
  title={{Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1364--1368},
  doi={10.21437/Interspeech.2023-423}
}

Thanks!

About

[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages