NUST Institutional Repository

Urdu/Arabic Information Retrieval System (Urdu/Arabic Ocr – Image and Text Handling)

Show simple item record

dc.contributor.author Qambber Hussain Syed
dc.date.accessioned 2020-10-22T13:50:46Z
dc.date.available 2020-10-22T13:50:46Z
dc.date.issued 2001
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/3519
dc.description Supervisor: Mr. Saqib Mir en_US
dc.description.abstract The project deals with the creation of a Multilingual Information Retrieval System. The objectives were to build a search engine which would search Urdu/Arabic (Multilingual) Information from the internet and show results to the user. The project has been implemented in two parts by the division of modules. The modules of both the parts are: The first part includes building Optical Character Recognition software (OCR) which takes textual images as input and produces raw Unicode Characters as output. Artificial Neural Networks are used to identify the text in the images and then assign corresponding Unicode Character to the identified text. The second part includes the development of a User Interface to enter a search criteria (an Urdu/Arabic query) and a URL (as a point to start the search), making a web crawler to search Urdu/Arabic textual images through HTML web pages, maintaining an XML document (database) to structure the unstructured multimedia data (images) from the internet attained by the crawler, application of the concepts of digital image processing to filter the images for standardization, saving Urdu/Arabic text (Unicode Characters) from images (attained by the OCR) in XML documents, processing retrieved text against the user supplied query with query comparison algorithms, providing ranking of the results and last but not the least, display of results to the user. This report explains the second part of the Urdu/Arabic Information Retrieval System termed as "Urdu/Arabic OCR – Image and Text Handling". en_US
dc.publisher National University of Sciences and Technology en_US
dc.subject Information Technology en_US
dc.title Urdu/Arabic Information Retrieval System (Urdu/Arabic Ocr – Image and Text Handling) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • BS [440]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account