Abstract:
Individuals with hearing impairments often struggle to understand spoken language in noisy environments, which can lead to social isolation and a decreased quality of life. In addition, patients with conditions such as aphonia and laryngectomy are unable to produce speech, which further limits their ability to communicate effectively.
To address these challenges, there is a need for a technology that can visually reconstruct spoken language in a way that is easily understandable to individuals with hearing impairments and patients with speech-related conditions. This technology would involve converting audio input from videos into speech, allowing individuals to see and interpret spoken language.
This project aims to develop a visual hearing technology that can accurately reconstruct spoken language from videos in silent or noisy environments. The technology will utilize machine learning techniques to generate speech signal from video input.
The success of this project will improve the communication abilities and quality of life of individuals with hearing impairments and speech-related conditions. The technology can be implemented in various settings, including social interactions, educational institutions, and workplaces, thereby increasing the inclusivity and accessibility of these environments.