skip to main content
10.1145/3581783.3611990acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Frequency-based Zero-Shot Learning with Phase Augmentation

Published: 27 October 2023 Publication History

Abstract

Zero-Shot Learning (ZSL) aims to recognize images from seen and unseen classes by aligning visual and semantic knowledge (e.g., attribute descriptions). However, the fine-grained attributes in the RGB domain can be easily affected by background noise (e.g., the grey bird tail blending with the ground), making it difficult to effectively distinguish them. Analyzing the features in the frequency domain assists in better distinguishing the attributes since their patterns remain consistent across different images, unlike noise which may be more variable. Nevertheless, existing ZSL methods typically learn visual features directly from the RGB domain, which can impede the recognition of certain attributes. To overcome this limitation, we propose a novel ZSL method named Frequency-based Phase Augmentation (FPA) network, which learns an effective representation of the attributes in the frequency domain. Specifically, we introduce a Hybrid Phase Augmentation (HPA) module to transform visual features into the frequency domain and augment the phase component for better retention of semantic information of the attributes. The use of phase-augmented features enables FPA to capture more semantic knowledge that can be challenging to distinguish in the RGB domain, suppress noise, and highlight significant attributes. Our extensive experiments show that FPA achieves state-of-the-art performance across four standard datasets.

References

[1]
Chuanbin Liu, Hongtao Xie, Zhengjun Zha, Lingyun Yu, Zhineng Chen, and Yongdong Zhang. Bidirectional attention-recognition model for fine-grained object classification. IEEE Transactions on Multimedia, 22(7):1785--1795, 2019.
[2]
Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, and Yongdong Zhang. Dual-stream knowledge-preserving hashing for unsupervised video retrieval. In ECCV, pages 181--197. Springer Nature Switzerland, 2022.
[3]
Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, and Yongdong Zhang. Momentdiff: Generative video moment retrieval from random to real. arXiv preprint arXiv:2307.02869, 2023.
[4]
Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, and Yongdong Zhang. Domain-specific embedding network for zero-shot recognition. In MM, pages 2070--2078, 2019.
[5]
Jiannan Ge, Hongtao Xie, Shaobo Min, Pandeng Li, and Yongdong Zhang. Dual part discovery network for zero-shot learning. In MM, pages 3244--3252, 2022.
[6]
Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, and Fei Sha. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European conference on computer vision, 2016.
[7]
Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for attribute-based classification. In CVPR, pages 819--826, 2013.
[8]
Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. Describing objects by their attributes. In CVPR, 2009.
[9]
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. Devise: A deep visual-semantic embedding model. NeurIPS, 26, 2013.
[10]
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE transactions on pattern analysis and machine intelligence, 36(3):453--465, 2013.
[11]
Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.
[12]
Meng Ye and Yuhong Guo. Progressive ensemble networks for zero-shot recognition. In CVPR, 2019.
[13]
Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, and Ling Shao. Attentive region embedding network for zero-shot learning. In CVPR, 2019.
[14]
Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, and Yongdong Zhang. Domain-aware visual bias eliminating for generalized zero-shot learning. In CVPR, 2020.
[15]
Jiannan Ge, Hongtao Xie, Shaobo Min, and Yongdong Zhang. Semantic-guided reinforced region embedding for generalized zero-shot learning. In AAAI, volume 35, pages 1406--1414, 2021.
[16]
Alan V Oppenheim and Jae S Lim. The importance of phase in signals. Proceedings of the IEEE, 69(5):529--541, 1981.
[17]
Leon N Piotrowski and Fergus W Campbell. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception, 11(3):337--346, 1982.
[18]
Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. NeurIPS, 26, 2013.
[19]
Dinesh Jayaraman and Kristen Grauman. Zero-shot recognition with unreliable attributes. Advances in neural information processing systems, 27, 2014.
[20]
Jingjing Li, Mengmeng Jing, Ke Lu, Zhengming Ding, Lei Zhu, and Zi Huang. Leveraging the invariant side of generative zero-shot learning. In CVPR, 2019.
[21]
Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, and Zeynep Akata. Attribute prototype network for zero-shot learning. arXiv preprint arXiv:2008.08290, 2020.
[22]
Maunil R Vyas, Hemanth Venkateswara, and Sethuraman Panchanathan. Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV, 2020.
[23]
Zongyan Han, Zhenyong Fu, Shuo Chen, and Jian Yang. Contrastive embedding for generalized zero-shot learning. In CVPR, 2021.
[24]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139--144, 2020.
[25]
Yongqin Xian, Saurabh Sharma, Bernt Schiele, and Zeynep Akata. f-vaegan-d2: A feature generating framework for any-shot learning. In CVPR, pages 10275--10284, 2019.
[26]
Akanksha Paul, Narayanan C Krishnan, and Prateek Munjal. Semantically aligned bias reducing zero shot learning. In CVPR, pages 7056--7065, 2019.
[27]
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[28]
He Huang, Changhu Wang, Philip S Yu, and Chang-Dong Wang. Generative dual adversarial network for generalized zero-shot learning. In CVPR, pages 801--810, 2019.
[29]
Zhi Chen, Yadan Luo, Ruihong Qiu, Sen Wang, Zi Huang, Jingjing Li, and Zheng Zhang. Semantics disentangling for generalized zero-shot learning. In CVPR, pages 8712--8720, 2021.
[30]
Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. Faster neural networks straight from jpeg. NeurIPS, 31, 2018.
[31]
Yanchao Yang and Stefano Soatto. Fda: Fourier domain adaptation for semantic segmentation. In CVPR, 2020.
[32]
Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, and Gao Huang. Frequency domain image translation: More photo-realistic, better identity-preserving. In ICCV, pages 13930--13940, 2021.
[33]
Keshigeyan Chandrasegaran, Ngoc-Trung Tran, and Ngai-Man Cheung. A closer look at fourier spectrum discrepancies for cnn-generated images detection. In CVPR, pages 7200--7209, 2021.
[34]
Pandeng Li, Hongtao Xie, Shaobo Min, Jiannan Ge, Xun Chen, and Yongdong Zhang. Deep fourier ranking quantization for semi-supervised image retrieval. Transactions on Image Processing, 31:5909--5922, 2022.
[35]
Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global filter networks for image classification. NeurIPS, 2021.
[36]
Guangyao Chen, Peixi Peng, Li Ma, Jia Li, Lin Du, and Yonghong Tian. Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In ICCV, 2021.
[37]
Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Harold Liu, Ziteng Wang, and Di Liu. Causality inspired representation learning for domain generalization. In CVPR, pages 8046--8056, 2022.
[38]
Zhongqi Yue, Tan Wang, Qianru Sun, Xian-Sheng Hua, and Hanwang Zhang. Counterfactual zero-shot and open-set visual recognition. In CVPR, 2021.
[39]
Shiming Chen, Wenjie Wang, Beihao Xia, Qinmu Peng, Xinge You, Feng Zheng, and Ling Shao. Free: Feature refinement for generalized zero-shot learning. In ICCV, 2021.
[40]
Junhan Kim, Kyuhong Shim, and Byonghyo Shim. Semantic feature extraction for generalized zero-shot learning. In AAAI, volume 36, pages 1166--1173, 2022.
[41]
Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, and Shih-Fu Chang. Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In CVPR, pages 1043--1052, 2018.
[42]
Yuval Atzmon and Gal Chechik. Adaptive confidence smoothing for generalized zero-shot learning. In CVPR, 2019.
[43]
Dat Huynh and Ehsan Elhamifar. Compositional zero-shot learning via fine-grained dense feature composition. Advances in Neural Information Processing Systems, 2020.
[44]
Dat Huynh and Ehsan Elhamifar. Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR, 2020.
[45]
Ivan Skorokhodov and Mohamed Elhoseiny. Class normalization for (continual)? generalized zero-shot learning. arXiv preprint arXiv:2006.11328, 2020.
[46]
Shiming Chen, GuoSen Xie, Yang Liu, Qinmu Peng, Baigui Sun, Hao Li, Xinge You, and Ling Shao. Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. NeurIPS, 34:16622-16634, 2021.
[47]
Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, and Xinge You. Transzero: Attribute-guided transformer for zero-shot learning. In AAAI, volume 2, page 3, 2022.
[48]
Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Yang, Qinmu Peng, Kai Wang, Jian Zhao, and Xinge You. Msdn: Mutually semantic distillation network for zero-shot learning. In CVPR, pages 7612--7621, 2022.
[49]
Yaogong Feng, Xiaowen Huang, Pengbo Yang, Jian Yu, and Jitao Sang. Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis. In CVPR, pages 9346--9355, 2022.
[50]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200--2011 dataset. 2011.
[51]
Yongqin Xian, Christoph H Lampert, Bernt Schiele, and Zeynep Akata. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI, 2018.
[52]
Genevieve Patterson and James Hays. Sun attribute database: Discovering, anno- tating, and recognizing scene attributes. In CVPR, 2012.
[53]
Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. Feature generating networks for zero-shot learning. In CVPR, pages 5542--5551, 2018.
[54]
Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees GM Snoek, and Ling Shao. Latent embedding feedback and discriminative features for zero-shot classification. In ECCV, 2020.
[55]
Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. Evalu- ation of output embeddings for fine-grained image classification. In CVPR, pages 2927--2936, 2015.
[56]
Elyor Kodirov, Tao Xiang, and Shaogang Gong. Semantic autoencoder for zero-shot learning. In CVPR, pages 3174--3183, 2017.
[57]
Huajie Jiang, Ruiping Wang, Shiguang Shan, and Xilin Chen. Transferable contrastive network for generalized zero-shot learning. In ICCV, pages 9765--9774, 2019

Cited By

View all
  • (2024)Towards Discriminative Feature Generation for Generalized Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2024.340804826(10514-10529)Online publication date: 1-Jan-2024
  • (2024)Attention-driven frequency-based Zero-Shot Learning with phase augmentationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02512-wOnline publication date: 30-Dec-2024

Index Terms

  1. Frequency-based Zero-Shot Learning with Phase Augmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. joint embedding
    2. object recognition
    3. zero-shot learning

    Qualifiers

    • Research-article

    Funding Sources

    • Anhui Provincial Natural Science Foundation
    • the National Nature Science Foundation of China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)114
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards Discriminative Feature Generation for Generalized Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2024.340804826(10514-10529)Online publication date: 1-Jan-2024
    • (2024)Attention-driven frequency-based Zero-Shot Learning with phase augmentationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02512-wOnline publication date: 30-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media