dc.description.abstract |
The rapid growth of digital data on web has created the problem of information excess. Many
users face difficulty to get the required relevant information within time from huge online
repository.
Automatic text summarization is used to solve this problem by compressing the text into shorter
form containing only the meaningful information so that it is not obligatory for user to go
through each and every line in document for understanding the core concept behind it.
This thesis focuses on the design, implementation and analysis of an optimized fuzzy model by
using a feature term based automatic text summarization method based on sentence extraction to
generate meaningful summary of scientific documents.
Initially, the text document to be summarized is given to the system and the Preprocessing stage
removes noise from the input document and produces a clean document. The proposed Model
consists of three methods. First is the General Statistical Method (GSM), where feature terms are
extracted by paragraph and sentence segmentation which includes further steps of tokenization,
stop word removal, case folding and removal of non-essential sentences from document. Based
on these identified feature terms; cue words, frequent words and sentence position, weights are
assigned and each sentence score is calculated and the high score sentences are extracted. In
second method, the Fuzzy Logic Model (FL), the output result from GSM and the identified
features are used as an input to Fuzzy inference system (FIS). The FIS, on the basis of fuzzy rule
set extracts the most important sentences out of the selected ones to be included in summary. In
third method which is the Optimized Fuzzy Model (OFM) the input and output fuzzy parameters
as well as the fuzzy rule weights are optimized to get the optimized weight of each feature. Now
Page 8
each sentence score is calculated based on these weights and the highly scored sentences are
selected to be included in final optimized summary document.
The proposed technique is implemented in java using NetBeans IDE 6.9.1 and Jfuzzylogic 2.1a
package. In order to evaluate the system, the summaries generated using each of the three
methods are tested with the golden standard summary (human-generated summary) and
compared with each other as well as with other summarizers such as MS-Word 2007 summarizer
and Essential summarizer for the purpose of comprehensive efficiency analysis. The evaluation
measurements such as Precision, Recall and F-measure are calculated for each summary
generated. |
en_US |