Abstract:
This project implements the capability of distributed computing using the application of
Speaker identification. A database approach is utilized for detection of specific person
where the voice of unknown speaker is compared with already saved data. A cluster of
about 5 computers is established by using the Hadoop on which all the processing occurs.
First a database of voice features is created and maintained in Hadoop’s database known
as HBase. The database is recorded from the various students and teachers in the college.
The training is done by first pre-processing the voice inputs for some noise removal. Then
Mfcc features are extracted from these voice signals using Matlab VoiceBox. Then a
compression technique, Vector Quantization, is used to reduce the number of feature
vectors obtained from Mfcc. K-Means is applied as a clustering technique and then saved
into HBase database. This is the training data. Then an unknown speaker voice signal is
taken as input and after pre-processing Mfcc features are extracted and then compared
with every entry of database for the most likely voice features. The matching part is done
by using the technique of parallel processing. Inputs are distributed over the cluster and
MapReduce tasks are performed on all the nodes running TaskTracker. All the nodes are
controlled by a Master node on which Jobtracker and Namenode are running. Major
applications of the project include site access, credit card authorization, secure phone
access to banking, database services, and access to secure equipment etc.