Generalization and Specialization in Zero-Shot Learning

Access & Terms of Use
embargoed access
Embargoed until 2025-02-14
Copyright: Li, Yun
Altmetric
Abstract
Image classification has made remarkable success due to deep learning techniques and vast amounts of labeled data. However, in real-world scenarios, the data distribution is long-tailed, making acquiring sufficient labels difficult, thus hindering the performance of deep models. To overcome these obstacles, Zero-Shot Learning (ZSL) is proposed. ZSL aims to transfer classification ability from seen to unseen classes with semantic side information as the bridge. The success of ZSL requires two crucial abilities, i.e., the generalization ability to transfer classification capability to unseen classes and the specialization ability to extract discriminative features. This thesis investigates the two abilities to address ZSL and its variant, Generalized Zero-Shot Learning (GZSL), where testing images can come from both seen and unseen classes. To enhance the generalization ability, we employ a generative network and adapt it to diverse task characteristics to synthesize visual features of unseen classes and incorporate meta-learning to eliminate inherent biases towards seen classes. To improve the specialization ability, we increase the visual distinction between features by dynamically discovering global-cooperative localities and progressively aggregating them based on visual correlations. We further introduce spiral learning to improve locality learning with semantic generalization, which revisits visual representations guided by a series of attribute groups to understand complex semantic relationships. However, focusing solely on one of these abilities may result in either being overly general with decreased classification performance or too specialized to generalize to unseen classes effectively. Therefore, we propose to equip the two abilities simultaneously and balance them at the instance and dataset levels via a self-adjusted diversity loss and a linear annealed updating schedule. Additionally, we extend our approach to another ZSL scenario, Compositional ZSL (CZSL), where labels are combinations of attributes and objects, and conduct experiments in Open-World settings (OW-CZSL). In this setting, we enhance specialization by non-local and local attention mechanisms and improve generalization by disentangling attribute and object features. In summary, we propose novel frameworks for ZSL/GZSL/CZSL to improve and balance generalization and specialization abilities and achieve state-of-the-art performance in different settings on multiple benchmarks.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2023
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty