NUST Institutional Repository

A Step-One Visual Learning App for children leveraging Knowledge-Aware Visual Question Answering Framework

Show simple item record

dc.contributor.author Jamshed, Ahmed
dc.date.accessioned 2022-01-31T09:13:52Z
dc.date.available 2022-01-31T09:13:52Z
dc.date.issued 2022
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/28538
dc.description.abstract Visual Question Answering (VQA) is a complex cognitive multimodal inference problem in which the latest techniques from the fields of NLP (Natural Language Processing) and CV (Computer Visions) are merged with the goal of answering open-ended and free-formed natural language questions by understanding the visual analogies. Previ ously, the performance of VQA models used to suffer when asked knowledge-based and commonsense aware questions but it all changed with the introduction of transform ers, as transformer-based language models now possess some degree of knowledge and commonsense implicitly. Additionally, we can also provide external knowledge explicitly using Knowledge Reasoning and Representation (KRR) techniques to further enhance the performance benchmarks. In order to train and benchmark these knowledge-aware VQA models, several datasets like OK-VQA, GQA, KVQA etc. are introduced in which questions require some sort of cognitive inference from available external knowledge. These datasets are quite capable as they are carefully crafted for the intended purpose but they are static as they have limited questions and images only. This paper presents the framework which is capable of producing multiple relevant knowledge-aware MCQs associated with each unique image, using the knowledge-rich corpus from Wikipedia. These MCQs can be used for preparing dynamic knowledge aware VQA datasets. We can also use this framework by developing a visual learning app to educate children in an interactive manner, especially in the remote areas of de veloping countries where they seldom get a chance to learn new concepts in a proper school environment. en_US
dc.description.sponsorship Dr. Muhammad Moazam Fraz en_US
dc.language.iso en en_US
dc.publisher SEECS, NUST en_US
dc.subject Visual Question Answering, Image Semantic Understanding, Natural Lan guage Processing, Transformers, Automated Datasets en_US
dc.title A Step-One Visual Learning App for children leveraging Knowledge-Aware Visual Question Answering Framework en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [375]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account