Skip to content

This model seeks to decipher sequences of lip movements captured in video frames and translate them into meaningful spoken language or phonetic representations.

Notifications You must be signed in to change notification settings

keshh22/SpeechRecogMotion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Recognition Through Lip Motion Analysis

This study investigates the progress in visual speech recognition (VSR), a domain that interprets spoken language from lip movements. The research delves into the architectural components of VSR technology, specifically focusing on convolutional and recurrent neural network architectures. Rigorous testing has been conducted on the GRID corpus to ensure unbiased results. Proposed advancements aim to improve accuracy and broaden applicability through the integration of state-of-the-art deep learning techniques, multi-modal audio-visual learning, and attention mechanisms. The research proposes a novel deep learning model that can convert video sequences of lip movements into spoken text. This model leverages sequence-to-sequence learning with encoder-decoder components to map visual features to textual representations. Potential applications encompass speech recognition, accessibility solutions, and audio-visual synchronization, with a particular emphasis on diverse speaker populations and languages. The research sheds light on the evolution of VSR, highlighting the importance of multi-modal strategies, user-centric design principles, and robust evaluation metrics for fostering more inclusive and effective communication systems.

About

This model seeks to decipher sequences of lip movements captured in video frames and translate them into meaningful spoken language or phonetic representations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published