NUST Institutional Repository

Scene Analysis for the Visually Impaired (SAVI)

Show simple item record

dc.contributor.author Mirza Muhammad Ali Baig, Muhammad Abdullah Wajahat Mian Ihtisham Shah
dc.date.accessioned 2021-01-05T07:52:41Z
dc.date.available 2021-01-05T07:52:41Z
dc.date.issued 2018
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/20514
dc.description Supervisor: Dr Omar Arif en_US
dc.description.abstract Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potential when it comes to helping people who are visually impaired. It has been developing over the past few years and there are many implementations that can adequately caption an image. Another concept that has gained a lot of traction in the past few years is Zero-Shot Learning. It aims to bridge the shortcomings present in training datasets by identifying unknown classes using semantic concepts that are present in the dataset. A major problem when working in the field of image captioning is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to target these techniques for real life use. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Doing this will enable the image captioning algorithm to be more robust and tailored towards real-life use. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We report a 74% positive correction ratio over the captions generated by the underlying generator, where the ratio represents the number of changes in which an object was correctly identified and injected in the caption. en_US
dc.publisher SEECS, National University of Sciences and Technology, Islamabad en_US
dc.subject Computer Science en_US
dc.title Scene Analysis for the Visually Impaired (SAVI) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • BS [211]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account