research-article

Frequency-based Zero-Shot Learning with Phase Augmentation

Authors:

Yongdong ZhangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 3181 - 3189

https://doi.org/10.1145/3581783.3611990

Published: 27 October 2023 Publication History

Abstract

Zero-Shot Learning (ZSL) aims to recognize images from seen and unseen classes by aligning visual and semantic knowledge (e.g., attribute descriptions). However, the fine-grained attributes in the RGB domain can be easily affected by background noise (e.g., the grey bird tail blending with the ground), making it difficult to effectively distinguish them. Analyzing the features in the frequency domain assists in better distinguishing the attributes since their patterns remain consistent across different images, unlike noise which may be more variable. Nevertheless, existing ZSL methods typically learn visual features directly from the RGB domain, which can impede the recognition of certain attributes. To overcome this limitation, we propose a novel ZSL method named Frequency-based Phase Augmentation (FPA) network, which learns an effective representation of the attributes in the frequency domain. Specifically, we introduce a Hybrid Phase Augmentation (HPA) module to transform visual features into the frequency domain and augment the phase component for better retention of semantic information of the attributes. The use of phase-augmented features enables FPA to capture more semantic knowledge that can be challenging to distinguish in the RGB domain, suppress noise, and highlight significant attributes. Our extensive experiments show that FPA achieves state-of-the-art performance across four standard datasets.

References

[1]

Chuanbin Liu, Hongtao Xie, Zhengjun Zha, Lingyun Yu, Zhineng Chen, and Yongdong Zhang. Bidirectional attention-recognition model for fine-grained object classification. IEEE Transactions on Multimedia, 22(7):1785--1795, 2019.

[2]

Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, and Yongdong Zhang. Dual-stream knowledge-preserving hashing for unsupervised video retrieval. In ECCV, pages 181--197. Springer Nature Switzerland, 2022.

Digital Library

[3]

Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, and Yongdong Zhang. Momentdiff: Generative video moment retrieval from random to real. arXiv preprint arXiv:2307.02869, 2023.

[4]

Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, and Yongdong Zhang. Domain-specific embedding network for zero-shot recognition. In MM, pages 2070--2078, 2019.

Digital Library

[5]

Jiannan Ge, Hongtao Xie, Shaobo Min, Pandeng Li, and Yongdong Zhang. Dual part discovery network for zero-shot learning. In MM, pages 3244--3252, 2022.

Digital Library

[6]

Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, and Fei Sha. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European conference on computer vision, 2016.

[7]

Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for attribute-based classification. In CVPR, pages 819--826, 2013.

Digital Library

[8]

Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. Describing objects by their attributes. In CVPR, 2009.

[9]

Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. Devise: A deep visual-semantic embedding model. NeurIPS, 26, 2013.

[10]

Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE transactions on pattern analysis and machine intelligence, 36(3):453--465, 2013.

[11]

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.

[12]

Meng Ye and Yuhong Guo. Progressive ensemble networks for zero-shot recognition. In CVPR, 2019.

[13]

Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, and Ling Shao. Attentive region embedding network for zero-shot learning. In CVPR, 2019.

[14]

Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, and Yongdong Zhang. Domain-aware visual bias eliminating for generalized zero-shot learning. In CVPR, 2020.

[15]

Jiannan Ge, Hongtao Xie, Shaobo Min, and Yongdong Zhang. Semantic-guided reinforced region embedding for generalized zero-shot learning. In AAAI, volume 35, pages 1406--1414, 2021.

[16]

Alan V Oppenheim and Jae S Lim. The importance of phase in signals. Proceedings of the IEEE, 69(5):529--541, 1981.

[17]

Leon N Piotrowski and Fergus W Campbell. A demonstration of the visual importance and flexibility of spatial-frequency amplitude and phase. Perception, 11(3):337--346, 1982.

[18]

Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. NeurIPS, 26, 2013.

[19]

Dinesh Jayaraman and Kristen Grauman. Zero-shot recognition with unreliable attributes. Advances in neural information processing systems, 27, 2014.

[20]

Jingjing Li, Mengmeng Jing, Ke Lu, Zhengming Ding, Lei Zhu, and Zi Huang. Leveraging the invariant side of generative zero-shot learning. In CVPR, 2019.

[21]

Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, and Zeynep Akata. Attribute prototype network for zero-shot learning. arXiv preprint arXiv:2008.08290, 2020.

[22]

Maunil R Vyas, Hemanth Venkateswara, and Sethuraman Panchanathan. Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV, 2020.

Digital Library

[23]

Zongyan Han, Zhenyong Fu, Shuo Chen, and Jian Yang. Contrastive embedding for generalized zero-shot learning. In CVPR, 2021.

[24]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139--144, 2020.

Digital Library

[25]

Yongqin Xian, Saurabh Sharma, Bernt Schiele, and Zeynep Akata. f-vaegan-d2: A feature generating framework for any-shot learning. In CVPR, pages 10275--10284, 2019.

[26]

Akanksha Paul, Narayanan C Krishnan, and Prateek Munjal. Semantically aligned bias reducing zero shot learning. In CVPR, pages 7056--7065, 2019.

[27]

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[28]

He Huang, Changhu Wang, Philip S Yu, and Chang-Dong Wang. Generative dual adversarial network for generalized zero-shot learning. In CVPR, pages 801--810, 2019.

[29]

Zhi Chen, Yadan Luo, Ruihong Qiu, Sen Wang, Zi Huang, Jingjing Li, and Zheng Zhang. Semantics disentangling for generalized zero-shot learning. In CVPR, pages 8712--8720, 2021.

[30]

Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. Faster neural networks straight from jpeg. NeurIPS, 31, 2018.

[31]

Yanchao Yang and Stefano Soatto. Fda: Fourier domain adaptation for semantic segmentation. In CVPR, 2020.

[32]

Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, and Gao Huang. Frequency domain image translation: More photo-realistic, better identity-preserving. In ICCV, pages 13930--13940, 2021.

[33]

Keshigeyan Chandrasegaran, Ngoc-Trung Tran, and Ngai-Man Cheung. A closer look at fourier spectrum discrepancies for cnn-generated images detection. In CVPR, pages 7200--7209, 2021.

[34]

Pandeng Li, Hongtao Xie, Shaobo Min, Jiannan Ge, Xun Chen, and Yongdong Zhang. Deep fourier ranking quantization for semi-supervised image retrieval. Transactions on Image Processing, 31:5909--5922, 2022.

Digital Library

[35]

Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global filter networks for image classification. NeurIPS, 2021.

[36]

Guangyao Chen, Peixi Peng, Li Ma, Jia Li, Lin Du, and Yonghong Tian. Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In ICCV, 2021.

[37]

Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Harold Liu, Ziteng Wang, and Di Liu. Causality inspired representation learning for domain generalization. In CVPR, pages 8046--8056, 2022.

[38]

Zhongqi Yue, Tan Wang, Qianru Sun, Xian-Sheng Hua, and Hanwang Zhang. Counterfactual zero-shot and open-set visual recognition. In CVPR, 2021.

[39]

Shiming Chen, Wenjie Wang, Beihao Xia, Qinmu Peng, Xinge You, Feng Zheng, and Ling Shao. Free: Feature refinement for generalized zero-shot learning. In ICCV, 2021.

[40]

Junhan Kim, Kyuhong Shim, and Byonghyo Shim. Semantic feature extraction for generalized zero-shot learning. In AAAI, volume 36, pages 1166--1173, 2022.

[41]

Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, and Shih-Fu Chang. Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In CVPR, pages 1043--1052, 2018.

[42]

Yuval Atzmon and Gal Chechik. Adaptive confidence smoothing for generalized zero-shot learning. In CVPR, 2019.

[43]

Dat Huynh and Ehsan Elhamifar. Compositional zero-shot learning via fine-grained dense feature composition. Advances in Neural Information Processing Systems, 2020.

[44]

Dat Huynh and Ehsan Elhamifar. Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR, 2020.

[45]

Ivan Skorokhodov and Mohamed Elhoseiny. Class normalization for (continual)? generalized zero-shot learning. arXiv preprint arXiv:2006.11328, 2020.

[46]

Shiming Chen, GuoSen Xie, Yang Liu, Qinmu Peng, Baigui Sun, Hao Li, Xinge You, and Ling Shao. Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. NeurIPS, 34:16622-16634, 2021.

[47]

Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, and Xinge You. Transzero: Attribute-guided transformer for zero-shot learning. In AAAI, volume 2, page 3, 2022.

[48]

Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Yang, Qinmu Peng, Kai Wang, Jian Zhao, and Xinge You. Msdn: Mutually semantic distillation network for zero-shot learning. In CVPR, pages 7612--7621, 2022.

[49]

Yaogong Feng, Xiaowen Huang, Pengbo Yang, Jian Yu, and Jitao Sang. Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis. In CVPR, pages 9346--9355, 2022.

[50]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200--2011 dataset. 2011.

[51]

Yongqin Xian, Christoph H Lampert, Bernt Schiele, and Zeynep Akata. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI, 2018.

[52]

Genevieve Patterson and James Hays. Sun attribute database: Discovering, anno- tating, and recognizing scene attributes. In CVPR, 2012.

[53]

Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. Feature generating networks for zero-shot learning. In CVPR, pages 5542--5551, 2018.

[54]

Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees GM Snoek, and Ling Shao. Latent embedding feedback and discriminative features for zero-shot classification. In ECCV, 2020.

Digital Library

[55]

Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. Evalu- ation of output embeddings for fine-grained image classification. In CVPR, pages 2927--2936, 2015.

[56]

Elyor Kodirov, Tao Xiang, and Shaogang Gong. Semantic autoencoder for zero-shot learning. In CVPR, pages 3174--3183, 2017.

[57]

Huajie Jiang, Ruiping Wang, Shiguang Shan, and Xilin Chen. Transferable contrastive network for generalized zero-shot learning. In ICCV, pages 9765--9774, 2019

Cited By

Ge JXie HLi PXie LMin SZhang Y(2024)Towards Discriminative Feature Generation for Generalized Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2024.340804826(10514-10529)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3408048
Yin WGe JZhang LLi PLiu YXie H(2024)Attention-driven frequency-based Zero-Shot Learning with phase augmentationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02512-wOnline publication date: 30-Dec-2024
https://doi.org/10.1007/s13042-024-02512-w

Index Terms

Frequency-based Zero-Shot Learning with Phase Augmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Dual Part Discovery Network for Zero-Shot Learning
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Zero-Shot Learning (ZSL) aims to recognize unseen classes by transferring knowledge from seen classes. Recent methods focus on learning a common semantic space to align visual and attribute information. However, they always over-relied on provided ...
Attribute subspaces for zero-shot learning
Abstract
Zero-shot learning (ZSL) aims to recognize unseen categories without corresponding training samples, which is a practical yet challenging task in computer vision and pattern recognition community. Current state-of-the-art locality-based ZSL ...
Highlights
- We propose a novel attribute subspace method for discriminative attribute representation learning. AS-ZSL is the first work to introduce subspace representation learning to investigate attribute composition for the ZSL task.
- We design ...
Attribute relation learning for zero-shot classification

In computer vision and pattern recognition communities, one often-encountered problem is that the limited labeled training data are not enough to cover all the classes, which is also called the zero-shot learning problem. For addressing that challenging ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Anhui Provincial Natural Science Foundation
the National Nature Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
206
Total Downloads

Downloads (Last 12 months)114
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ge JXie HLi PXie LMin SZhang Y(2024)Towards Discriminative Feature Generation for Generalized Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2024.340804826(10514-10529)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3408048
Yin WGe JZhang LLi PLiu YXie H(2024)Attention-driven frequency-based Zero-Shot Learning with phase augmentationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02512-wOnline publication date: 30-Dec-2024
https://doi.org/10.1007/s13042-024-02512-w

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten