Scene Understanding Image Description In Natural Language

Imran Khurram

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

dc.contributor.author	Imran Khurram
dc.date.accessioned	2020-12-07T11:18:54Z
dc.date.available	2020-12-07T11:18:54Z
dc.date.issued	2012
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/16581
dc.description	Supervisor: Dr. Muhammad Moazam Fraz	en_US
dc.description.abstract	Automatic image captioning is an active and highly challenging research problem in computer vision aiming to understand and describe the contents of the scene in human understandable language. Existing solutions for image captioning are based on holistic approaches where the whole image is described at once, potentially losing the important aspects of the scene. To enable, more detailed captioning, we propose Dense CaptionNet, a deep region-based modular image captioning architecture, which extracts and describes each region of the image individually to include more details of the scene in the overall caption. The proposed architecture consists of three main modules to describe the image objects. The first one generates region descriptions which not only includes objects but object relationships as well and the second one generates the attributes related to those objects. The textual descriptions generated by these two modules are fused in a text file to provide as input for the subsequent sentence generation module which works as an encoder-decoder framework to merge and form a single meaningful and grammatically correct sentence which is detailed enough to describe the whole scene in a better way. The proposed architecture is trained using Visual Genome, IAPR TC-12 and MSCOCO datasets and tested on un-seen set of IAPR TC-12 dataset because of detailed nature of its descriptions. The trained architecture out-performs the existing state-of-the-art techniques e.g., Neural Talk and Show, Attend and Tell, using standard evaluation metrics especially on complex scenes.	en_US
dc.publisher	SEECS, National University of Sciences and Technology, Islamabad	en_US
dc.subject	Image captioning, computer vision, deep learning, dense image captioning, RNN, visual captioning, scene understanding	en_US
dc.title	Scene Understanding Image Description In Natural Language	en_US
dc.type	Thesis	en_US