On-Premise-Speech-to-Text

A Flask API to convert speech to text using Offline Transcription methods - CMU Sphinx and DeepSpeech.

File Descriptions:

DeepSpeech.ipynb - Run this file to generate the DeepSpeech model and store the model files in a folder called 'deepspeech-0.6.0-models'. This step has to be completed before running anything else.
home.py - Main python file containing the flask APIs.
video_structuring.py - This python script converts the video/audio file into a .wav file (16kHz, 16 bit rate and 1 channel) of duration 50 seconds and saves it to the 'Files/Audio' folder.
cmu_sphinx.py - Python code to convert the wav file to text using CMU Sphinx.
deep_speech.py - Python code to convert the wav file to text using DeepSpeech.

Output:

Output will be stored in 'Files/Transcript/output.txt' file.

APIs:

Home.py contains two flask APIs -

1. /generate_transcript (POST)

Request body: (form-data format)
'method': 'cmu' (for cmu sphinx) or 'deepspeech' (for deepspeech),
'file': <The uploaded audio/video file>

2. /download_transcript (GET)

Sends the 'output.txt' file containing the transcript of the uploaded audio to the client.

Libraries:

Few of the libraries that have to be imported -

ffmpeg
pydub
flask-cors
srt
SpeechRecognition
download swigwin (and add to path)
download Visual studio C++ Build tools and add to path. https://visualstudio.microsoft.com/visual-cpp-build-tools/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On-Premise-Speech-to-Text

File Descriptions:

Output:

APIs:

1. /generate_transcript (POST)

2. /download_transcript (GET)

Libraries:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Files		Files
DeepSpeech.ipynb		DeepSpeech.ipynb
README.md		README.md
cmu_sphinx.py		cmu_sphinx.py
deep_speech.py		deep_speech.py
home.py		home.py
video_structuring.py		video_structuring.py

imanom/On-Premise-Speech-to-Text

Folders and files

Latest commit

History

Repository files navigation

On-Premise-Speech-to-Text

File Descriptions:

Output:

APIs:

1. /generate_transcript (POST)

2. /download_transcript (GET)

Libraries:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages