Word Spotting in Document Image Analysis

Hafiz Adnan Niaz

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
MS
→
View Item

dc.contributor.author	Hafiz Adnan Niaz
dc.date.accessioned	2020-12-31T11:20:45Z
dc.date.available	2020-12-31T11:20:45Z
dc.date.issued	2017
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/20272
dc.description	Supervisor; Usman Akram	en_US
dc.description.abstract	The amount of digital information around us has witnessed a remarkable growth during the last two decades and almost every type of information can be accessed within a span of few clicks. Like other sources, paper documents have also been digitized facilitating rapid access to the readers. This digitization of documents and books is only effective if it is complemented by a search mechanism allowing users retrieve the desired content. This led to a tremendous research in Optical Character Recognition (OCR) systems which convert document images into text allowing search and retrieval facility. Although OCR has been established research area for many years, for many scripts, OCR systems are either non-existent or in early days of research. In some cases, recognition of text is very challenging due to complexity of script. To address these issues, Word Spotting allows retrieving the document containing occurrence of provided query word by matching the shape of words without any knowledge on the semantics. This work present a word spotting based indexing and retrieval system for digitized English documents. The document image with English text is segmented into ligatures and each ligature is represented by a set of features. Clustering of ligature is then carried out to group ligature into cluster. For indexing, a document is segmented into ligature and each ligature is classified into one of the ligature classes. An index file is maintained for each cluster which stores all occurrences (locations) of the ligature in the given document. During the retrieval phase, a query word presented to the systems is segmented into ligature and each ligature is matched with the existing clusters. Finally, for each ligature in the query word, the document containing the occurrences of the ligature are retrieved using the index file.	en_US
dc.publisher	EME, National University of Science and Technology , Islamabad	en_US
dc.subject	Computer Engineering	en_US
dc.title	Word Spotting in Document Image Analysis	en_US
dc.type	Thesis	en_US