Abstract:
The presence of Urdu information on the Internet is in the form of images due to the absence of a standard Urdu font, thus rendering the retrieval of such information on demand almost impossible. To counter this problem an Optical Character Recognition (OCR) system is needed which when combined with a customized search engine1 can retrieve textual Urdu information from images and furnish it on demand. There has been only limited success in the field of Urdu Optical Character Recognition owing mostly to difficulties such as the cursive-ness and overlapping of most Urdu fonts. The proposed system aims to overcome these problems but due to the time limitation can only reasonably accurately identify only one font type though the structure of the product allows for the inclusion of more fonts through the use of a modular system.
The approach to this involves the use of Artificial Neural Networks, as image classification is tedious when it comes to normal computational means. Artificial Neural Networks are computational models that emulate the human brain at a very small scale [22]. The effectiveness of Artificial Neural Networks in image identification and classification has been proven [20]. The system will be fed an image as input which will then be broken down and reflected within to be identified in which case Urdu characters in the form of Unicode will be outputted.