research-article

Zero-shot Image Classification with Logic Adapter and Rule Prompt

Authors:

Bo YangAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 2075 - 2084

https://doi.org/10.1145/3589334.3645554

Published: 13 May 2024 Publication History

Abstract

Zero-shot image classification, which aims to predict unseen classes whose samples have never appeared during the training phase, is crucial in the Web domain because many new web images appear on various websites. Attributes, as annotations for class-level characteristics, are widely used semantic information for this task. However, most current methods often fail to capture discriminative image features between similar images from different classes, leading to unsatisfactory zero-shot image classification results. This is because they solely focus on limited visual-attribute feature alignment. Therefore, we propose a Zero-Shot image Classification with Logic adapter and Rule prompt method called ZSCLR, which utilizes logic adapter and rule prompts to encourage the model to capture discriminative image features and achieve reasoning. Specifically, ZSCLR consists of a visual perception module and a logic adapter. The visual perception module extracts image features from training data. At the same time, the logic adapter utilizes the Markov logic network to encode the extracted image features and rule prompts for refining the discriminative image features. Due to predicates of rule prompts representing symbolic discriminative features, the proposed model can focus more on these discriminative features and achieve more precise image classification. Additionally, the logic adapter enables the model to adapt from recognizing images in seen classes to those in unseen classes through the reasoning of the Markov logic networks. We implement experiments on three standard zero-shot image classification benchmarks, and ZSCLR achieves competitive performance. Furthermore, ZSCLR can provide explanations for its predictions through rule prompts.

Supplemental Material

MP4 File

Supplemental video

Download
12.48 MB

References

[1]

Wieland Brendel and Matthias Bethge. 2019. Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. International Conference on Learning Representations (2019).

[2]

Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2016. Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5327--5336.

[3]

Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, and Xinge You. 2022a. Transzero: Attribute-guided transformer for zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 330--338.

[4]

Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Yang, Qinmu Peng, Kai Wang, Jian Zhao, and Xinge You. 2022b. Msdn: Mutually semantic distillation network for zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7612--7621.

[5]

Shiming Chen, Wenjie Wang, Beihao Xia, Qinmu Peng, Xinge You, Feng Zheng, and Ling Shao. 2021. Free: Feature refinement for generalized zero-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision. 122--131.

[6]

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z Pan, and Huajun Chen. 2023. Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 405--413.

Digital Library

[7]

Pedro Domingos and Daniel Lowd. 2019. Unifying logical and statistical AI with Markov logic. Commun. ACM, Vol. 62, 7 (2019), 74--83.

Digital Library

[8]

Dat Huynh and Ehsan Elhamifar. 2020a. Compositional zero-shot learning via fine-grained dense feature composition. Advances in Neural Information Processing Systems, Vol. 33 (2020), 19849--19860.

[9]

Dat Huynh and Ehsan Elhamifar. 2020b. Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4483--4493.

[10]

Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, and Eric P Xing. 2019. Rethinking knowledge graph propagation for zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11487--11496.

[11]

Dehui Kong, Xiliang Li, Shaofan Wang, Jinghua Li, and Baocai Yin. 2023. Learning visual-and-semantic knowledge embedding for zero-shot image classification. Applied Intelligence, Vol. 53, 2 (2023), 2250--2264.

Digital Library

[12]

Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Xuanyi Dong, and Chengqi Zhang. 2021b. Isometric propagation network for generalized zero-shot learning. International Conference on Learning Representations (2021).

[13]

Yang Liu, Jishun Guo, Deng Cai, and Xiaofei He. 2019. Attribute attention for semantic disambiguation in zero-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision. 6698--6707.

[14]

Yang Liu, Lei Zhou, Xiao Bai, Yifei Huang, Lin Gu, Jun Zhou, and Tatsuya Harada. 2021a. Goal-oriented gaze estimation for zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3794--3803.

[15]

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. 2014. Zero-shot learning by convex combination of semantic embeddings. International Conference on Learning Representations (2014).

[16]

Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE conference on computer vision and pattern recognition. 2751--2758.

[17]

Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine learning, Vol. 62, 1 (2006), 107--136.

[18]

Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8247--8255.

[19]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.

[20]

Tasfia Shermin, Shyh Wei Teng, Ferdous Sohel, Manzur Murshed, and Guojun Lu. 2022. Integrated generalized zero-shot learning for fine-grained classification. Pattern Recognition, Vol. 122 (2022), 108246.

Digital Library

[21]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1--9.

[22]

Maunil R Vyas, Hemanth Venkateswara, and Sethuraman Panchanathan. 2020. Leveraging seen and unseen semantic relationships for generative zero-shot learning. In European Conference on Computer Vision. 70--86.

Digital Library

[23]

Ziyu Wan, Dongdong Chen, Yan Li, Xingguang Yan, Junge Zhang, Yizhou Yu, and Jing Liao. 2019. Transductive zero-shot learning with visual structure constraint. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[24]

Jiwei Wei, Yang Yang, Zeyu Ma, Jingjing Li, Xing Xu, and Heng Tao Shen. 2022. Semantic Enhanced Knowledge Graph for Large-Scale Zero-Shot Learning. arXiv preprint arXiv:2212.13151 (2022).

[25]

Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. 2010. Caltech-UCSD birds 200. (2010).

[26]

Yongqin Xian, Bernt Schiele, and Zeynep Akata. 2017. Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4582--4591.

[27]

Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, and Ling Shao. 2019. Attentive region embedding network for zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9384--9393.

[28]

Guo-Sen Xie, Li Liu, Fan Zhu, Fang Zhao, Zheng Zhang, Yazhou Yao, Jie Qin, and Ling Shao. 2020. Region graph embedding network for zero-shot learning. In European Conference on Computer Vision. 562--580.

Digital Library

[29]

Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, and Zeynep Akata. 2020. Attribute prototype network for zero-shot learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 21969--21980.

[30]

Bo Yang, Yuxueqing Zhang, Yida Peng, chunxu Zhang, and Jing Hang. 2021. Collaborative Filtering Based Zero-Shot Learning. Journal of Software, Vol. 32, 9 (2021), 2801--2815.

[31]

Guanyu Yang, Kaizhu Huang, Rui Zhang, John Y Goulermas, and Amir Hussain. 2020. Self-focus deep embedding model for coarse-grained zero-shot classification. In International Conference on Brain Inspired Cognitive Systems. 12--22.

Digital Library

[32]

Dongran Yu, Bo Yang, Qianhao Wei, Anchen Li, and Shirui Pan. 2022. A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10609--10618.

[33]

Yunlong Yu, Zhong Ji, Jungong Han, and Zhongfei Zhang. 2020. Episode-based prototype generating network for zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14035--14044.

[34]

Zhongqi Yue, Tan Wang, Qianru Sun, Xian-Sheng Hua, and Hanwang Zhang. 2021. Counterfactual zero-shot and open-set visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15404--15414.

[35]

Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2021--2030.

[36]

Ziming Zhang and Venkatesh Saligrama. 2016. Zero-shot recognition via structured prediction. In European Conference on Computer Vision. 533--548.

[37]

Yizhe Zhu, Jianwen Xie, Zhiqiang Tang, Xi Peng, and Ahmed Elgammal. 2019. Semantic-guided multi-attention localization for zero-shot learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).

Index Terms

Zero-shot Image Classification with Logic Adapter and Rule Prompt
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification

Recommendations

Zero-shot classification with unseen prototype learning
Abstract
Zero-shot learning (ZSL) aims at recognizing instances from unseen classes via training a classification model with only seen data. Most existing approaches easily suffer from the classification bias from unseen to seen categories since the models ...
Zero-shot Image Categorization by Image Correlation Exploration
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

The problem of image categorization from zero or only a few training examples, called zero-shot learning, occurs frequently, but it has hardly been studied in computer vision research. To tackle this problem, mid-level semantic attributes are introduced ...
Zero-shot image classification via Visual–Semantic Feature Decoupling
Abstract
Zero-shot image classification refers to the use of labeled images to train a classification model that can correctly classify images of unseen categories. Traditional zero-shot methods use attribute labels as supervisory information and map the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Key R&D Program of China award number
National Natural Science Foundation of China award number

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
213
Total Downloads

Downloads (Last 12 months)213
Downloads (Last 6 weeks)16

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten