Urdu/Arabic Information Retrieval System (Urdu/Arabic Ocr – Image and Text Handling)

Qambber Hussain Syed

DSpace Home
→
E-Theses
→
SEECS
→
Information Technology
→
BS
→
View Item

dc.contributor.author	Qambber Hussain Syed
dc.date.accessioned	2020-10-22T13:50:46Z
dc.date.available	2020-10-22T13:50:46Z
dc.date.issued	2001
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/3519
dc.description	Supervisor: Mr. Saqib Mir	en_US
dc.description.abstract	The project deals with the creation of a Multilingual Information Retrieval System. The objectives were to build a search engine which would search Urdu/Arabic (Multilingual) Information from the internet and show results to the user. The project has been implemented in two parts by the division of modules. The modules of both the parts are: The first part includes building Optical Character Recognition software (OCR) which takes textual images as input and produces raw Unicode Characters as output. Artificial Neural Networks are used to identify the text in the images and then assign corresponding Unicode Character to the identified text. The second part includes the development of a User Interface to enter a search criteria (an Urdu/Arabic query) and a URL (as a point to start the search), making a web crawler to search Urdu/Arabic textual images through HTML web pages, maintaining an XML document (database) to structure the unstructured multimedia data (images) from the internet attained by the crawler, application of the concepts of digital image processing to filter the images for standardization, saving Urdu/Arabic text (Unicode Characters) from images (attained by the OCR) in XML documents, processing retrieved text against the user supplied query with query comparison algorithms, providing ranking of the results and last but not the least, display of results to the user. This report explains the second part of the Urdu/Arabic Information Retrieval System termed as "Urdu/Arabic OCR – Image and Text Handling".	en_US
dc.publisher	National University of Sciences and Technology	en_US
dc.subject	Information Technology	en_US
dc.title	Urdu/Arabic Information Retrieval System (Urdu/Arabic Ocr – Image and Text Handling)	en_US
dc.type	Thesis	en_US