A Flask API to convert speech to text using Offline Transcription methods - CMU Sphinx and DeepSpeech.
-
DeepSpeech.ipynb - Run this file to generate the DeepSpeech model and store the model files in a folder called 'deepspeech-0.6.0-models'. This step has to be completed before running anything else.
-
home.py - Main python file containing the flask APIs.
-
video_structuring.py - This python script converts the video/audio file into a .wav file (16kHz, 16 bit rate and 1 channel) of duration 50 seconds and saves it to the 'Files/Audio' folder.
-
cmu_sphinx.py - Python code to convert the wav file to text using CMU Sphinx.
-
deep_speech.py - Python code to convert the wav file to text using DeepSpeech.
Output will be stored in 'Files/Transcript/output.txt' file.
Home.py contains two flask APIs -
Request body: (form-data format)
'method': 'cmu' (for cmu sphinx) or 'deepspeech' (for deepspeech),
'file': <The uploaded audio/video file>
Sends the 'output.txt' file containing the transcript of the uploaded audio to the client.
Few of the libraries that have to be imported -
ffmpeg
pydub
flask-cors
srt
SpeechRecognition
download swigwin (and add to path)
download Visual studio C++ Build tools and add to path. https://visualstudio.microsoft.com/visual-cpp-build-tools/