dc.description.abstract |
Purpose: This thesis presents the need, purpose, approaches and results for identification of
difficult words from English text. With the passage of time, Human Language Technology
(HLT) helps disabled, foreigner’s and low literate individuals to enhance the communication
skills, and learn the language efficiently along with the use of computers and other
technologies. Natural language processing part of HLT is an emerging field of computer
science that is widely used for processing of unstructured text. In an educational domain,
difficult words may not only affect the reading, writing , understandability and interpretation
of the text but also results in poor academic achievement of the Hearing Impaired Children
when compared to their normal hearing peers. This happens because they lack increased
knowledge of vocabulary. Speech language Pathologist/Therapist (SLP), their teachers and
parents indulge them in different learning activities to increase their vocabulary. Presenting
simple text to such children will help them to use simple words in daily routine to enhance
their vocabulary knowledge as they can learn more words in short period of time. It will also
help their parents and teachers to prepare reading and writing materials, simpler to learn, for
them .It will also help the child to learn language in simpler way. This motivates the need of
technique(s) to classify words as difficult or not difficult from the text available in English
textbooks, or online study material available for them. So, the prime objective of this research
work is to propose and develop a methodology or technique that assist to identify difficult
English words from text. Methodology: We proposed a methodology for identification of
difficult words from the English text in order to assist hearing impaired children in learning
the language in simplest way. After preprocessing of the text and implication of feature
extraction technique to extract features based on linguistic rules specific to hearing impaired
children, C4.5 decision tree machine learning algorithm is used to classify words as difficult
or not difficult from the given text. Proposed technique is applied on different text documents
to evaluate its effectiveness. Results: 92.5% accuracy is achieved when model is evaluated
against annotated tested dataset1 specific to hearing impaired children. Whereas 5-fold cross
validation method gives an accuracy of 94.2%. This depicts that for remaining words which
are unable to identify by our proposed methodology are considered as errors due to non
availability of linguistic rule in training model. It is evaluated that accuracy is strongly
dependent on datasets during classification of difficult words. |
en_US |