Empowering Voices, Bridging Silences.
For a comprehensive demonstration of our VocalAid project in action, please watch the following video:
For many in the deaf community, mastering pronunciation can be a significant challenge due to the limited feedback on their speech production. Traditional methods of speech training often require extensive interaction with speech therapists, which can be costly and not always accessible. Furthermore, the feedback loop in learning pronunciation is crucial; without the ability to hear their own voice, deaf individuals may struggle to correct and refine their speech patterns. This situation underscores a broader issue of inclusivity and accessibility in speech training resources.
VocalAid aims to bridge this gap by leveraging technology to provide an innovative solution. Our application is designed to assist deaf individuals in improving their pronunciation through interactive feedback. Here’s how we address the problem:
- Immediate Feedback: Using advanced speech recognition technology, VocalAid provides real-time feedback on pronunciation, allowing users to adjust and practice their speech instantaneously.
- Visual Learning: The app breaks down words into syllables and displays visual cues, making it easier for users to understand and mimic the correct pronunciation.
- Customized Learning Paths: Based on the user's progress, VocalAid customizes the learning experience, focusing on areas that need improvement, thus making the learning process more efficient and effective.
- Accessible Anywhere: By creating a mobile application, we ensure that our solution is accessible to anyone with a smartphone, overcoming the barrier of needing to be physically present at a speech therapy session.
Through VocalAid, we aspire to empower individuals with hearing impairments by providing them with the tools they need to improve their speech autonomously, fostering greater independence and confidence in their communication abilities.
Our project aligns with the United Nations Sustainable Development Goals (SDGs), focusing on three critical areas:
- Quality Education: Enhancing access to inclusive educational resources and lifelong learning opportunities.
- Reduced Inequalities: Addressing disparities and enabling equitable access to technology and resources.
- Peace, Justice, and Strong Institutions: Contributing to the creation of inclusive societies with equal opportunities for all.
- Flutter: For cross-platform mobile application development.
- Firebase: For secure data storage and user authentication.
- TensorFlow: For fine-tuning speech recognition models.
- Google Cloud: For hosting our services with high availability.
To train our speech recognition models, we've secured a dataset from the pivotal research titled "Corpus of deaf speech for acoustic and speech production," which can be found here. Provided by Dr. Lisa Lucks Mendel of the University of Memphis, this dataset aligns seamlessly with our project's aims, offering an array of speech samples vital for our tool.
Owing to confidentiality agreements, we are unable to distribute this dataset. Those interested in accessing the dataset may request permission by reaching out to the original authors.
-
Preprocessing: Cleaning and formatting the data to ensure consistency across the dataset.
-
Augmentation: Enhancing the dataset with synthetic variations to improve model robustness.
-
Model Development: The model development phase was focused on crafting a neural network capable of efficient speech recognition. This involved:
- Preparing the dataset by loading and preprocessing audio data.
- Designing a convolutional neural network architecture, utilizing layers suited for feature extraction from audio signals, and dense layers for classification.
- Implementing a training regimen with validation sets to fine-tune performance and employing callbacks to save the most effective model iteration based on validation accuracy.
-
Testing and Validation: Extensive testing and validation phases were crucial for refining the model, aiming to achieve the highest possible accuracy and reliability in speech recognition.
-
Continuous Learning: Our approach includes a continuous learning phase, where the model is regularly updated with new data to enhance its performance and adaptability over time.
This guide provides a step-by-step overview of how to use VocalAid, featuring screenshots and descriptions for each key feature.
Follow these steps to maximize your learning experience with VocalAid and enhance your pronunciation skills effectively.
As VocalAid continues to evolve, our vision for the future includes several ambitious goals:
- Sentence Level Practice: Expanding our focus from word-level to sentence-level practice, enhancing the user's ability to communicate effectively in real-world scenarios.
- Visualization of Speech Mechanics: Developing features to help users visualize the mechanics of speech, including lip movement, tongue placement, and mouth shape, during pronunciation practice. This will aid in providing a more comprehensive learning experience.
- Support for Low-Resource Languages: Extending our application's capabilities to include support for low-resource languages, making VocalAid accessible to a wider audience and fostering inclusivity.
- Advanced Feedback Mechanisms: Implementing more sophisticated feedback mechanisms that leverage AI to provide even more personalized and effective pronunciation guidance.
By pursuing these goals, VocalAid aims to become an even more powerful tool for individuals seeking to improve their pronunciation and communication skills, regardless of their linguistic background or hearing ability.
Salman Ahmad GitHub | LinkedIn |
Abdullah Zubair GitHub | LinkedIn |
Muhammad Riyan GitHub | LinkedIn |
Zaraar Malik GitHub | LinkedIn |