NUST Institutional Repository

Scene Understanding Image Description In Natural Language

Show simple item record

dc.contributor.author Imran Khurram
dc.date.accessioned 2020-12-07T11:18:54Z
dc.date.available 2020-12-07T11:18:54Z
dc.date.issued 2012
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/16581
dc.description Supervisor: Dr. Muhammad Moazam Fraz en_US
dc.description.abstract Automatic image captioning is an active and highly challenging research problem in computer vision aiming to understand and describe the contents of the scene in human understandable language. Existing solutions for image captioning are based on holistic approaches where the whole image is described at once, potentially losing the important aspects of the scene. To enable, more detailed captioning, we propose Dense CaptionNet, a deep region-based modular image captioning architecture, which extracts and describes each region of the image individually to include more details of the scene in the overall caption. The proposed architecture consists of three main modules to describe the image objects. The first one generates region descriptions which not only includes objects but object relationships as well and the second one generates the attributes related to those objects. The textual descriptions generated by these two modules are fused in a text file to provide as input for the subsequent sentence generation module which works as an encoder-decoder framework to merge and form a single meaningful and grammatically correct sentence which is detailed enough to describe the whole scene in a better way. The proposed architecture is trained using Visual Genome, IAPR TC-12 and MSCOCO datasets and tested on un-seen set of IAPR TC-12 dataset because of detailed nature of its descriptions. The trained architecture out-performs the existing state-of-the-art techniques e.g., Neural Talk and Show, Attend and Tell, using standard evaluation metrics especially on complex scenes. en_US
dc.publisher SEECS, National University of Sciences and Technology, Islamabad en_US
dc.subject Image captioning, computer vision, deep learning, dense image captioning, RNN, visual captioning, scene understanding en_US
dc.title Scene Understanding Image Description In Natural Language en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [376]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account