Voice Recognition from EvilRP
Features
- Completely FREE (not using paid API's like Microsoft/IBM/Google)
- Voice background removal on FFMPEG Post-Processing.
- Voice length reduction removing the silent parts of the audio with FFMPEG Post-Processing.
- Developers are able to retranslate words in case the API missunderstands it, you'll see if you use it for a while.
- As fast as it could be, long voice audio (over 5 seconds) should take more less than 4000 msec to process, since this dependes on a lot of factors (network latency for API response times, CPU-load for the FFMPEG Post-Processing) this can depend on your server/computer.
- Max Audio Length is 20 seconds due to API limitation.
- You can do whatever you want with your voice, with some coding of course.
Tips
- Recommended to run with 1+ CPU Cores even tho one core can work good with it.
- Recommended to run behind Cloudflare or any caching provider for improved response time and all the features that cloudflare provides.
Steps
- Create an account in https://wit.ai/
- Create a New Application (Make Sure to set your visibility to Private and the Recognition Language to whatever you want)
- Once the Application has been created make sure to go the Management -> Settings tab
- There you will find the Server Access Token (API Key) which will be used to process all the Voice Recognition requests, copy it and then place the API Key in the speech_server.py that you downloaded from this repo
- Now your speech server is fully configured to be recognizing voices
Requirements
- FFMPEG (added in the system PATH)
- Python 3.8 or greater
- A server with more than one core in order to process the requests as fast as possible.
Python Libraries
- pip install mutagen
- pip install fastapi
- pip install logging
- pip install wit
- pip install uvicorn
How to run the Speech Server
- Open a cmd (inside your speech_server folder) and run uvicorn speech_server:app --workers 8
However this is not mandatory is recommended to run this service behind Cloudflare or any other reverse proxy HTTP service. In order to send the requests from FiveM to your speech_server.py, you have in the configuration.lua the Config.Endpoint where requests will be sent to
For example if you are running this within a domain/cloudflare etc your speech_server listening port should be set to 80
https://voiceserver.roleplay.net/speech
But if you are not using a domain/cloudflare your URL should be like this (make sure that you have your port TCP open)
http://163.210.34.39:8000/speech
This is completely open-source, you can fork it, recode it, re-style it, do whatever you want with it. And yeah you are able to do PR if you can improve the code, as always I do those things for myself and then since I don't use them, I don't give support as much as I would do with any active project.
This is how the UI looks like in FiveM
This has been used with my one-core CPU machine so don't expect worse results than this one!