Voice Recognization Assistant

Inspired by Google Home and Amazon Alexa, the goal of this project was to create a more user-customizable voice assistant that could more accurately simulate a music player in terms of playing a song given by users, storing favorites in a playlist, and having the ability to play songs randomly from a playlist or in order.

Program Explanation

Speech-to-Text

To convert users' voices into text, the speech_recognition library was utilized. The program selects the microphone and listens for audio input, attempting to transcribe the captured audio into text using Google’s speech recognition service. If an error occurs during the conversion process, the program indicates that an error has taken place.

Text-to-Speech

In order for the program to respond to user commands, a text-to-speech model was set up using IBM Watson's Text to Speech library. Note: The API key for this must be set up on their website. The function retrieves the program's reply and converts it into a .wav file using a text-to-speech synthesizer, which vocalizes the program's response. The file is then played for the user and subsequently deleted.

Components of NLP ChatBot

Transformers library offers pre-trained models for a variety of natural language processing tasks. In this instance, it utilizes the DialoGPT model, which is specifically designed for generating conversational responses. The chatbot operates in a continuous loop, where it listens for user input, processes that input, and generates a response. Specifically, the program utilizes the DialoGPT-medium model, which is fine-tuned for generating dialogue and can produce coherent responses based on user interactions. The recognized text is encapsulated in a Conversation object, which is subsequently passed to the NLP pipeline. The model processes the input and generates a response, which is then extracted from the model's output and refined to retain only the relevant text.

Audio Playing

Play Music Function: Once the program recognizes that the user wants to play some audio, it cleans up the input text to isolate the music title. It then constructs a YouTube search URL to find the track using the provided music name, employing the requests library and regular expressions (regex). The program fetches the search results and retrieves the URL of the first video from YouTube, extracting the video title using BeautifulSoup. Next, it checks if the track is already downloaded (in WAV format) in the program's directory. If it is, the program plays the track using simpleaudio. If the track is not found locally, it downloads it using youtube_dl and ffmpeg, saving it in the specified directory. Special characters in the file name are removed to prevent potential errors.

Remove Function: The remove function searches for a URL on YouTube using the same methods. If it finds a matching URL for the user's query, it retrieves the title and then searches for that title in the audio directory. If the file is found, it is deleted.

Play playlist and play random functions: The playlist function iterates through all existing audio files, playing them one at a time. In contrast, the random play function selects a random file from the directory and plays it.

Authors

@AumkarMali

Deployment

Download Arduino IDE from https://docs.arduino.cc/software/ide-v1/tutorials/Windows

Links

➊ Github: https://github.com/AumkarMali/

➋ Youtube: https://www.youtube.com/channel/UC7rhCKur9bF01lV0pNJNkvA

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
main.py		main.py
musicControl.py		musicControl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Recognization Assistant

Program Explanation

Speech-to-Text

Text-to-Speech

Components of NLP ChatBot

Audio Playing

Authors

Deployment

Links

About

Releases

Packages

Languages

AumkarMali/Voice-Recognition-Assistant

Folders and files

Latest commit

History

Repository files navigation

Voice Recognization Assistant

Program Explanation

Speech-to-Text

Text-to-Speech

Components of NLP ChatBot

Audio Playing

Authors

Deployment

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages