dc.description.abstract |
Modern computer vision tasks rely heavily on extensive training data for accurate classification.
However, the collection and annotation of such data can be impractical.
To address this challenge, zero-shot learning (ZSL) and generalized zero-shot learning
(GZSL) have emerged, enabling the recognition of previously unseen classes by
transferring semantic information. This thesis explores the significance of generative
models and disentangled representation learning for ZSL, GZSL, Multi-label ZSL, and
Multi-label GZSL. The first approach integrates conditional variational autoencoders
(CVAE) and conditional generative adversarial networks (CGAN) for ZSL and GZSL.
The feature generation process is constrained by the Regressor network through cycleconsistency
loss. Expanding on this framework, the approach extends to Multi-label
ZSL and GZSL, aiming for precise classification of images containing multiple unseen
classes. The objective was to accurately classify images containing multiple unseen
classes, which were absent during the training.
The second approach centers on the acquisition of disentangled representations to elevate
the quality of generated data. In disentangled representations, each latent component
is sensitive to changes in a single generative factor. So, to accomplish this objective,
we introduce the utilization of an identifiable variational autoencoder (iVAE),
which is derived from the VAE framework and capable of addressing both conventional
and GZSL. We have also extended this approach to Multi-label ZSL and GZSL,
incorporating global image-level semantic information for semantically consistent representations.
The proposed methods are evaluated on a range of ZSL, GZSL, Multi-label
ZSL, and Multi-label GZSL datasets. Initial experiments for ZSL and GZSL involve
evaluating them on the following standard datasets: CUB, AWA1, AWA2, SUN, and
aPY. In the context of Multi-label ZSL and GZSL, we conduct comprehensive experiments
on two extensive benchmark datasets: NUS-WIDE and MS COCO.
The results prominently demonstrate that our approach to disentangled representation
learning achieves commendable performance compared to other generative models. In
particular, when employing the disentangled representation learning approach, we observe
enhancements in accuracy compared to generative models. These enhancements
amount to 4.2%, 2.0%, 0.1%, 4.4%, and 1.0% for CUB, AWA1, AWA2, SUN, and aPY, respectively, in the context of ZSL. In the case of GZSL, we also observe significant
accuracy improvements, particularly in terms of harmonic mean. In the context
of Multi-label ZSL and GZSL, our disentangled representation learning approach delivers
substantial improvements as compared to the generative approach, particularly
in mean Average Precision (mAP). These improvements include a 1.4% enhancement
on the NUS-WIDE dataset and a 2.1% improvement on the MS COCO dataset for
Multi-label ZSL. |
en_US |