skip to main content
10.1145/3469877.3490581acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semantic Enhanced Cross-modal GAN for Zero-shot Learning

Authors Info & Claims
Published:10 January 2022Publication History

ABSTRACT

The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.

References

  1. Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2013. Label-embedding for attribute-based classification. In CVPR. 819–826.Google ScholarGoogle Scholar
  2. Zeynep Akata, Scott E. Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In CVPR. 2927–2936.Google ScholarGoogle Scholar
  3. Yi Bin, Yang Yang, Fumin Shen, Ning Xie, Heng Tao Shen, and Xuelong Li. 2018. Describing video with attention-based bidirectional LSTM. IEEE TCYB 49, 7 (2018), 2631–2641.Google ScholarGoogle ScholarCross RefCross Ref
  4. Rafael Felix, B. G. Vijay Kumar, Ian D. Reid, and Gustavo Carneiro. 2018. Multi-modal Cycle-Consistent Generalized Zero-Shot Learning. In ECCV. 21–37.Google ScholarGoogle Scholar
  5. Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS. 2121–2129.Google ScholarGoogle Scholar
  6. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In NIPS. 2672–2680.Google ScholarGoogle Scholar
  7. Xiang Guan, Guoqing Wang, Xing Xu, and Yi Bin. 2021. Learning Hierarchal Channel Attention for Fine-grained Visual Classification. In ACM MM. 5011–5019.Google ScholarGoogle Scholar
  8. Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved Training of Wasserstein GANs. In NIPS. 5767–5777.Google ScholarGoogle Scholar
  9. Omkar Gune, Biplab Banerjee, Subhasis Chaudhuri, and Fabio Cuzzolin. 2020. Generalized Zero-Shot Learning using Generated Proxy Unseen Samples and Entropy Separation. In ACM MM. 4262–4270.Google ScholarGoogle Scholar
  10. Yuchen Guo, Guiguang Ding, Jungong Han, and Yue Gao. 2017. Synthesizing Samples for Zero-shot Learning. In IJCAI. 1774–1780.Google ScholarGoogle Scholar
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.Google ScholarGoogle Scholar
  12. He Huang, Changhu Wang, Philip S. Yu, and Chang-Dong Wang. 2019. Generative Dual Adversarial Network for Generalized Zero-Shot Learning. In CVPR. 801–810.Google ScholarGoogle Scholar
  13. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In ICLR.Google ScholarGoogle Scholar
  14. Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic Autoencoder for Zero-Shot Learning. In CVPR. 4447–4456.Google ScholarGoogle Scholar
  15. Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In CVPR. 951–958.Google ScholarGoogle Scholar
  16. Hanhui Li, Donghui Li, and Xiaonan Luo. 2014. Bap: Bimodal attribute prediction for zero-shot image categorization. In ACM MM. 1013–1016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jingjing Li, Mengmeng Jing, Ke Lu, Zhengming Ding, Lei Zhu, and Zi Huang. 2019. Leveraging the Invariant Side of Generative Zero-Shot Learning. In CVPR. 7402–7411.Google ScholarGoogle Scholar
  18. Jingjing Li, Mengmeng Jing, Ke Lu, Lei Zhu, Yang Yang, and Zi Huang. 2019. Alleviating feature confusion for generative zero-shot learning. In ACM MM. 1587–1595.Google ScholarGoogle Scholar
  19. Jingjing Li, Mengmeng Jing, Lei Zhu, Zhengming Ding, Ke Lu, and Yang Yang. 2020. Learning Modality-Invariant Latent Representations for Generalized Zero-shot Learning. In ACM MM. 1348–1356.Google ScholarGoogle Scholar
  20. Teng Long, Xing Xu, Youyou Li, Fumin Shen, Jingkuan Song, and Heng Tao Shen. 2018. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In ACM MM. 1802–1810.Google ScholarGoogle Scholar
  21. Yang Long, Li Liu, Ling Shao, Fumin Shen, Guiguang Ding, and Jungong Han. 2017. From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In CVPR. 1627–1636.Google ScholarGoogle Scholar
  22. Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, and Yongdong Zhang. 2019. Domain-specific embedding network for zero-shot recognition. In ACM MM. 2070–2078.Google ScholarGoogle Scholar
  23. Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In CVPR workshops. 2188–2196.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jian Ni, Shanghang Zhang, and Haiyong Xie. 2019. Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning. In NIPS. 6143–6154.Google ScholarGoogle Scholar
  25. Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, and Jeffrey Dean. 2014. Zero-Shot Learning by Convex Combination of Semantic Embeddings. In ICLR.Google ScholarGoogle Scholar
  26. Akanksha Paul, Narayanan C. Krishnan, and Prateek Munjal. 2019. Semantically Aligned Bias Reducing Zero Shot Learning. In CVPR. 7056–7065.Google ScholarGoogle Scholar
  27. Liang Peng, Yang Yang, Zheng Wang, Zi Huang, and Heng Tao Shen. 2020. Mra-net: Improving vqa via multi-modal relation attention network. IEEE TPAMI (2020).Google ScholarGoogle Scholar
  28. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543.Google ScholarGoogle Scholar
  29. Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In ICML. 2152–2161.Google ScholarGoogle Scholar
  30. Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders. In CVPR. 8247–8255.Google ScholarGoogle Scholar
  31. Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE TKDE (2020).Google ScholarGoogle ScholarCross RefCross Ref
  32. Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Y. Ng. 2013. Zero-Shot Learning Through Cross-Modal Transfer. In NIPS. 935–943.Google ScholarGoogle Scholar
  33. Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In NIPS. 3483–3491.Google ScholarGoogle Scholar
  34. Xinhang Song, Haitao Zeng, Sixian Zhang, Luis Herranz, and Shuqiang Jiang. 2020. Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition. In ACM MM. 3976–3985.Google ScholarGoogle Scholar
  35. Vinay Kumar Verma, Gundeep Arora, Ashish Mishra, and Piyush Rai. 2018. Generalized zero-shot learning via synthesized examples. In CVPR. 4281–4289.Google ScholarGoogle Scholar
  36. Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial Cross-Modal Retrieval. In ACM MM.Google ScholarGoogle Scholar
  37. Jiwei Wei, Xing Xu, Zheng Wang, and Guoqing Wang. 2021. Meta Self-Paced Learning for Cross-Modal Matching. In ACM MM. 3835–3843.Google ScholarGoogle Scholar
  38. Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, and Heng Tao Shen. 2020. Universal weighting metric learning for cross-modal matching. In CVPR. 13005–13014.Google ScholarGoogle Scholar
  39. Jiwei Wei, Yang Yang, Xing Xu, Xiaofeng Zhu, and Heng Tao Shen. 2021. Universal Weighting Metric Learning for Cross-Modal Retrieval. IEEE TPAMI (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, and Bernt Schiele. 2016. Latent embeddings for zero-shot classification. In CVPR. 69–77.Google ScholarGoogle Scholar
  41. Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. 2018. Feature Generating Networks for Zero-Shot Learning. In CVPR. 5542–5551.Google ScholarGoogle Scholar
  42. Yongqin Xian, Saurabh Sharma, Bernt Schiele, and Zeynep Akata. 2019. F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. In CVPR. 10275–10284.Google ScholarGoogle Scholar
  43. Gang Yang, Jinlu Liu, Jieping Xu, and Xirong Li. 2018. Dissimilarity representation learning for generalized zero-shot recognition. In ACM MM. 2032–2039.Google ScholarGoogle Scholar
  44. Yang Yang, Yadan Luo, Weilun Chen, Fumin Shen, Jie Shao, and Heng Tao Shen. 2016. Zero-shot hashing via transferring supervised knowledge. In ACM MM. 1286–1295.Google ScholarGoogle Scholar
  45. Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In CVPR. 2021–2030.Google ScholarGoogle Scholar
  46. Liangli Zhen, Peng Hu, Xu Wang, and Dezhong Peng. 2019. Deep Supervised Cross-Modal Retrieval. In CVPR. 10394–10403.Google ScholarGoogle Scholar
  47. Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts. In CVPR. 1004–1013.Google ScholarGoogle Scholar

Index Terms

  1. Semantic Enhanced Cross-modal GAN for Zero-shot Learning
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia
          December 2021
          508 pages
          ISBN:9781450386074
          DOI:10.1145/3469877

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 January 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate59of204submissions,29%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia
        • Article Metrics

          • Downloads (Last 12 months)58
          • Downloads (Last 6 weeks)3

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format