dc.description.abstract |
Ample research has been done on language-related applications, especially the use of machine learning for speech recognition. However, recent research has focused on the use of deep learning in voice-based applications. This new machine learning research has become a very interesting field of study, with far superior results compared to other research, depending on the variety of applications. The proposed project uses machine learning and the concept of deep neural networks to score spoken language from a one-second audio file. The proposed project also aims to calculate age and gender using various files as input to the system. A secondary function of the project is to execute voice activation commands from specific speakers. The target audience for this project overview is college students, mostly bachelor's degrees, and may also serve as an open source project on Github. Commercial advertising is more relevant and can target specific age and gender groups, thus increasing sales. Forensic medicine can reduce suspects if there is evidence such as a phone call. Age and gender classification is especially useful in a variety of real-world applications such as security and video monitoring, electronic customer relationship management, biometrics, electronic vending machines, human-computer interaction, entertainment, cosmetics, and forensic arts. The main features of the project and expected features are:
Speech-to-Text from audio file
Speech-to-Text from real-time audio
Gender, Age estimation
Speaker estimation |
en_US |