This repository contains end-to-end automatic speech recognition models.This repository does not include training or audio or text preprocessing codes. If you want to see the code other than the model, please refer to here.
Many speech recognition open sources contain all the training-related code, making it hard to see only the model structure. So I have created a repository for only the models I've implemented and make them public.
I will continue to add to this the speech recognition models that I implement.
-
Deep Speech 2
Dario Amodei et al. Deep Speech2: End-to-End Speech Recognition in English and Mandarin
SeanNaren. deepspeech.pytorch -
Listen, Attend and Spell (modified version)
Wiliam Chan et al. Listen, Attend and Spell
Takaaki Hori et al. Advances in Joint CTC-Attention based E2E ASR with a Deep CNN Encoder and RNN-LM
IBM. Pytorch-seq2seq
clovaai. ClovaCall -
Speech Transformer
Ashish Vaswani et al. Attention Is All You Need
Yuanyuan Zhao et al. The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition
kaituoxu. Speech-Transformer -
Jasper
Jason Li et al, Jasper: An End-to-End Convolutional Neural Acoustic Model
NVIDIA. DeepLearningExample -
Voice Activity Detection (1 dimensional Resnet Model)
filippogiruzzi. voice_activity_detection
If you have any questions, bug reports, and feature requests, please open an issue on Github.
I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.
I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.
This project is licensed under the Apache-2.0 LICENSE - see the LICENSE.md file for details
- Soohwan Kim @sooftware
- Contacts: kaki.brain@kakaobrain.com