Automatic Speech Recognition is the task of transforming a speech waveform into a transcript sequence. Previous SOTA algorithms in this field mainly use supervised learning or semi-supervised learning, limiting the recognition to widely used languages only. However, in 2021, Baevski et al. instroduced a well performing unsupervised speech recognition algorithm called wave2vec_U in this paper. The unsupervised property makes it possible to do automatic speech recognition on low-resource languages. We researched and implemented this method by ourselves using PyTorch, together with an ablation study attempting to improve the baseline performance.
Since this is a class assignment, my code is not published on GitHub. Please read our project report (Jing Li & Hongling Lei) for more details.