NUST Institutional Repository

Generalized Zero-Shot Learning for Visual Object Recognition

Show simple item record

dc.contributor.author Gull, Muqaddas
dc.date.accessioned 2025-01-30T07:38:11Z
dc.date.available 2025-01-30T07:38:11Z
dc.date.issued 2024
dc.identifier.other 278643
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/49369
dc.description Supervisor: Dr. Omar Arif Co Supervisor: Dr. Rafia Mumtaz en_US
dc.description.abstract Modern computer vision tasks rely heavily on extensive training data for accurate classification. However, the collection and annotation of such data can be impractical. To address this challenge, zero-shot learning (ZSL) and generalized zero-shot learning (GZSL) have emerged, enabling the recognition of previously unseen classes by transferring semantic information. This thesis explores the significance of generative models and disentangled representation learning for ZSL, GZSL, Multi-label ZSL, and Multi-label GZSL. The first approach integrates conditional variational autoencoders (CVAE) and conditional generative adversarial networks (CGAN) for ZSL and GZSL. The feature generation process is constrained by the Regressor network through cycleconsistency loss. Expanding on this framework, the approach extends to Multi-label ZSL and GZSL, aiming for precise classification of images containing multiple unseen classes. The objective was to accurately classify images containing multiple unseen classes, which were absent during the training. The second approach centers on the acquisition of disentangled representations to elevate the quality of generated data. In disentangled representations, each latent component is sensitive to changes in a single generative factor. So, to accomplish this objective, we introduce the utilization of an identifiable variational autoencoder (iVAE), which is derived from the VAE framework and capable of addressing both conventional and GZSL. We have also extended this approach to Multi-label ZSL and GZSL, incorporating global image-level semantic information for semantically consistent representations. The proposed methods are evaluated on a range of ZSL, GZSL, Multi-label ZSL, and Multi-label GZSL datasets. Initial experiments for ZSL and GZSL involve evaluating them on the following standard datasets: CUB, AWA1, AWA2, SUN, and aPY. In the context of Multi-label ZSL and GZSL, we conduct comprehensive experiments on two extensive benchmark datasets: NUS-WIDE and MS COCO. The results prominently demonstrate that our approach to disentangled representation learning achieves commendable performance compared to other generative models. In particular, when employing the disentangled representation learning approach, we observe enhancements in accuracy compared to generative models. These enhancements amount to 4.2%, 2.0%, 0.1%, 4.4%, and 1.0% for CUB, AWA1, AWA2, SUN, and aPY, respectively, in the context of ZSL. In the case of GZSL, we also observe significant accuracy improvements, particularly in terms of harmonic mean. In the context of Multi-label ZSL and GZSL, our disentangled representation learning approach delivers substantial improvements as compared to the generative approach, particularly in mean Average Precision (mAP). These improvements include a 1.4% enhancement on the NUS-WIDE dataset and a 2.1% improvement on the MS COCO dataset for Multi-label ZSL. en_US
dc.language.iso en en_US
dc.publisher School of Electrical Engineering and Computer Science (SEECS)NUST en_US
dc.subject Disentangled Representation Learning, Zero-Shot Learning, Generalized Zero-Shot Learning, Identifiable VAE, Generative, Attribute-Level Feature Fusion XX en_US
dc.title Generalized Zero-Shot Learning for Visual Object Recognition en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account