ABSTRACT
The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.
- Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2013. Label-embedding for attribute-based classification. In CVPR. 819–826.Google Scholar
- Zeynep Akata, Scott E. Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In CVPR. 2927–2936.Google Scholar
- Yi Bin, Yang Yang, Fumin Shen, Ning Xie, Heng Tao Shen, and Xuelong Li. 2018. Describing video with attention-based bidirectional LSTM. IEEE TCYB 49, 7 (2018), 2631–2641.Google ScholarCross Ref
- Rafael Felix, B. G. Vijay Kumar, Ian D. Reid, and Gustavo Carneiro. 2018. Multi-modal Cycle-Consistent Generalized Zero-Shot Learning. In ECCV. 21–37.Google Scholar
- Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS. 2121–2129.Google Scholar
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In NIPS. 2672–2680.Google Scholar
- Xiang Guan, Guoqing Wang, Xing Xu, and Yi Bin. 2021. Learning Hierarchal Channel Attention for Fine-grained Visual Classification. In ACM MM. 5011–5019.Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. In NIPS. 5767–5777.Google Scholar
- Omkar Gune, Biplab Banerjee, Subhasis Chaudhuri, and Fabio Cuzzolin. 2020. Generalized Zero-Shot Learning using Generated Proxy Unseen Samples and Entropy Separation. In ACM MM. 4262–4270.Google Scholar
- Yuchen Guo, Guiguang Ding, Jungong Han, and Yue Gao. 2017. Synthesizing Samples for Zero-shot Learning. In IJCAI. 1774–1780.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.Google Scholar
- He Huang, Changhu Wang, Philip S. Yu, and Chang-Dong Wang. 2019. Generative Dual Adversarial Network for Generalized Zero-Shot Learning. In CVPR. 801–810.Google Scholar
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In ICLR.Google Scholar
- Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic Autoencoder for Zero-Shot Learning. In CVPR. 4447–4456.Google Scholar
- Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In CVPR. 951–958.Google Scholar
- Hanhui Li, Donghui Li, and Xiaonan Luo. 2014. Bap: Bimodal attribute prediction for zero-shot image categorization. In ACM MM. 1013–1016.Google ScholarDigital Library
- Jingjing Li, Mengmeng Jing, Ke Lu, Zhengming Ding, Lei Zhu, and Zi Huang. 2019. Leveraging the Invariant Side of Generative Zero-Shot Learning. In CVPR. 7402–7411.Google Scholar
- Jingjing Li, Mengmeng Jing, Ke Lu, Lei Zhu, Yang Yang, and Zi Huang. 2019. Alleviating feature confusion for generative zero-shot learning. In ACM MM. 1587–1595.Google Scholar
- Jingjing Li, Mengmeng Jing, Lei Zhu, Zhengming Ding, Ke Lu, and Yang Yang. 2020. Learning Modality-Invariant Latent Representations for Generalized Zero-shot Learning. In ACM MM. 1348–1356.Google Scholar
- Teng Long, Xing Xu, Youyou Li, Fumin Shen, Jingkuan Song, and Heng Tao Shen. 2018. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In ACM MM. 1802–1810.Google Scholar
- Yang Long, Li Liu, Ling Shao, Fumin Shen, Guiguang Ding, and Jungong Han. 2017. From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In CVPR. 1627–1636.Google Scholar
- Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, and Yongdong Zhang. 2019. Domain-specific embedding network for zero-shot recognition. In ACM MM. 2070–2078.Google Scholar
- Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In CVPR workshops. 2188–2196.Google ScholarCross Ref
- Jian Ni, Shanghang Zhang, and Haiyong Xie. 2019. Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning. In NIPS. 6143–6154.Google Scholar
- Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, and Jeffrey Dean. 2014. Zero-Shot Learning by Convex Combination of Semantic Embeddings. In ICLR.Google Scholar
- Akanksha Paul, Narayanan C. Krishnan, and Prateek Munjal. 2019. Semantically Aligned Bias Reducing Zero Shot Learning. In CVPR. 7056–7065.Google Scholar
- Liang Peng, Yang Yang, Zheng Wang, Zi Huang, and Heng Tao Shen. 2020. Mra-net: Improving vqa via multi-modal relation attention network. IEEE TPAMI (2020).Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543.Google Scholar
- Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In ICML. 2152–2161.Google Scholar
- Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders. In CVPR. 8247–8255.Google Scholar
- Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE TKDE (2020).Google ScholarCross Ref
- Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Y. Ng. 2013. Zero-Shot Learning Through Cross-Modal Transfer. In NIPS. 935–943.Google Scholar
- Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In NIPS. 3483–3491.Google Scholar
- Xinhang Song, Haitao Zeng, Sixian Zhang, Luis Herranz, and Shuqiang Jiang. 2020. Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition. In ACM MM. 3976–3985.Google Scholar
- Vinay Kumar Verma, Gundeep Arora, Ashish Mishra, and Piyush Rai. 2018. Generalized zero-shot learning via synthesized examples. In CVPR. 4281–4289.Google Scholar
- Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial Cross-Modal Retrieval. In ACM MM.Google Scholar
- Jiwei Wei, Xing Xu, Zheng Wang, and Guoqing Wang. 2021. Meta Self-Paced Learning for Cross-Modal Matching. In ACM MM. 3835–3843.Google Scholar
- Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, and Heng Tao Shen. 2020. Universal weighting metric learning for cross-modal matching. In CVPR. 13005–13014.Google Scholar
- Jiwei Wei, Yang Yang, Xing Xu, Xiaofeng Zhu, and Heng Tao Shen. 2021. Universal Weighting Metric Learning for Cross-Modal Retrieval. IEEE TPAMI (2021).Google ScholarDigital Library
- Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, and Bernt Schiele. 2016. Latent embeddings for zero-shot classification. In CVPR. 69–77.Google Scholar
- Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. 2018. Feature Generating Networks for Zero-Shot Learning. In CVPR. 5542–5551.Google Scholar
- Yongqin Xian, Saurabh Sharma, Bernt Schiele, and Zeynep Akata. 2019. F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. In CVPR. 10275–10284.Google Scholar
- Gang Yang, Jinlu Liu, Jieping Xu, and Xirong Li. 2018. Dissimilarity representation learning for generalized zero-shot recognition. In ACM MM. 2032–2039.Google Scholar
- Yang Yang, Yadan Luo, Weilun Chen, Fumin Shen, Jie Shao, and Heng Tao Shen. 2016. Zero-shot hashing via transferring supervised knowledge. In ACM MM. 1286–1295.Google Scholar
- Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In CVPR. 2021–2030.Google Scholar
- Liangli Zhen, Peng Hu, Xu Wang, and Dezhong Peng. 2019. Deep Supervised Cross-Modal Retrieval. In CVPR. 10394–10403.Google Scholar
- Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts. In CVPR. 1004–1013.Google Scholar
Index Terms
- Semantic Enhanced Cross-modal GAN for Zero-shot Learning
Recommendations
Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network
Conventional cross-modal retrieval models mainly assume the same scope of the classes for both the training set and the testing set. This assumption limits their extensibility on zero-shot cross-modal retrieval (ZS-CMR), where the testing set consists of ...
OntoZSL: Ontology-enhanced Zero-shot Learning
WWW '21: Proceedings of the Web Conference 2021Zero-shot Learning (ZSL), which aims to predict for those classes that have never appeared in the training data, has arisen hot research interests. The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic ...
Multi-label Generalized Zero-Shot Learning Using Identifiable Variational Autoencoders
Extended RealityAbstractMulti-label Zero-Shot Learning (ZSL) is an extension of traditional single-label ZSL, where the objective is to accurately classify images containing multiple unseen classes that are not available during training. Current techniques depends on ...
Comments