research-article

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification

Authors:

Haifeng HuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 2

Article No.: 75, Pages 1 - 23

https://doi.org/10.1145/3550278

Published: 06 February 2023 Publication History

Abstract

Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability from training data to construct a pre-defined graph as the input of graph network, which is inflexible and may degrade model generalizability. Moreover, most of the current methods cannot effectively align the learned salient object features with the label concepts, so that the predicted results of model may not be consistent with the image content. Therefore, how to learn the salient semantic features of images and capture the correlation between labels, and then effectively align them is one of the key to improve the performance of image multi-label classification task. To this end, we propose a novel image multi-label classification framework which aims to align Image Semantics with Label Concepts (ISLC). Specifically, we propose a residual encoder to learn salient object features in the images, and exploit the self-attention layer in aligned decoder to automatically capture the correlation between labels. Then, we leverage the cross-attention layers in aligned decoder to align image semantic features with label concepts, so as to make the labels predicted by model more consistent with image content. Finally, the output features of the last layer of residual encoder and aligned decoder are fused to obtain the final output feature for classification. The proposed ISLC model achieves good performance on various prevalent multi-label image datasets such as MS-COCO 2014, PASCAL VOC 2007, VG-500, and NUS-WIDE with 87.2%, 96.9%, 39.4%, and 64.2%, respectively.

References

[1]

Hakan Cevikalp, Burak Benligiray, Ömer Nezih Gerek, and Hasan Saribas. 2019. Semi-supervised robust deep neural networks for multi-label classification. In Proceedings of the CVPR Workshops. 9–17.

[2]

Tianshui Chen, Liang Lin, Xiaolu Hui, Riquan Chen, and Hefeng Wu. 2020. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2020), 1371–1384.

[3]

Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, and Liang Lin. 2019. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 522–531.

[4]

Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177–5186.

[5]

Xiangxiang Chu, Bo Zhang, Zhi Tian, Xiaolin Wei, and Huaxia Xia. 2021. Do we really need explicit position encodings for vision transformers? arXiv:2102.10882. Retrieved from https://arxiv.org/abs/2102.10882.

[6]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9.

Digital Library

[7]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv:1901.02860. Retrieved from https://arxiv.org/abs/1901.02860.

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.

[9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https://arxiv.org/abs/2010.11929.

[10]

Thibaut Durand, Taylor Mordan, Nicolas Thome, and Matthieu Cord. 2017. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 642–651.

[11]

Thibaut Durand, Nicolas Thome, and Matthieu Cord. 2018. Exploiting negative evidence for deep latent structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 337–351.

Digital Library

[12]

Ayushi Dutta, Yashaswi Verma, and C. V. Jawahar. 2020. Recurrent image annotation with explicit inter-label dependencies. In Proceedings of the European Conference on Computer Vision. Springer, 191–207.

Digital Library

[13]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.

Digital Library

[14]

Bin-Bin Gao and Hong-Yu Zhou. 2021. Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing 30, 6 (2021), 5920–5932.

[15]

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv:1312.4894. Retrieved from https://arxiv.org/abs/1312.4894.

[16]

Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, and Song Wang. 2019. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 729–739.

[17]

Jinyang Guo, Wanli Ouyang, and Dong Xu. 2020. Channel pruning guided by classification loss and feature importance. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34, 10885–10892.

[18]

Jinyang Guo, Wanli Ouyang, and Dong Xu. 2020. Multi-dimensional pruning: A unified framework for model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1508–1517.

[19]

Jinyang Guo, Weichen Zhang, Wanli Ouyang, and Dong Xu. 2020. Model compression using progressive channel pruning. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1114–1124.

[20]

Shikha Gupta, Krishan Sharma, Dileep Aroor Dinesh, and Veena Thenkanidiyoor. 2021. Visual semantic-based representation learning using deep CNNs for scene recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 2 (2021), 1–24.

Digital Library

[21]

Mohammed Hassanin, Ibrahim Radwan, Salman Khan, and Murat Tahtali. 2022. Learning discriminative representations for multi-label image recognition. Journal of Visual Communication and Image Representation 83, C (2022), 103448.

Digital Library

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[23]

Ruining He, Anirudh Ravula, Bhargav Kanagal, and Joshua Ainslie. 2020. RealFormer: Transformer likes residual attention. arXiv:2012.11747. Retrieved from https://arxiv.org/abs/2012.11747.

[24]

Yutao Hu, Xuhui Liu, Baochang Zhang, Jungong Han, and Xianbin Cao. 2021. Alignment enhancement network for fine-grained visual categorization. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–20.

Digital Library

[25]

Feiran Huang, Kaimin Wei, Jian Weng, and Zhoujun Li. 2020. Attention-based modality-gated networks for image-text sentiment analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 1–19.

Digital Library

[26]

Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, and Jianlong Fu. 2020. Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. arXiv:2004.00849. Retrieved from https://arxiv.org/abs/2004.00849.

[27]

Wanting Ji and Ruili Wang. 2021. A multi-instance multi-label dual learning approach for video captioning. ACM Transactions on Multimedia Computing Communications and Applications 17, 2s (2021), 1–18.

Digital Library

[28]

Jiren Jin and Hideki Nakayama. 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. In Proceedings of the 2016 23rd International Conference on Pattern Recognition. IEEE, 2452–2457.

[29]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123, 1 (2017), 32–73.

Digital Library

[30]

Jack Lanchantin, Tianlu Wang, Vicente Ordonez, and Yanjun Qi. 2021. General multi-label image classification with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16478–16488.

[31]

Duo Li, Anbang Yao, and Qifeng Chen. 2020. PSConv: Squeezing feature pyramid into one compact poly-scale convolutional layer. In Proceedings of the European Conference on Computer Vision. Springer, 615–632.

[32]

Junbing Li, Changqing Zhang, Xueman Wang, and Ling Du. 2020. Multi-scale cross-modal spatial attention fusion for multi-label image recognition. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 736–747.

Digital Library

[33]

Liang Li, Xinge Zhu, Yiming Hao, Shuhui Wang, Xingyu Gao, and Qingming Huang. 2019. A hierarchical CNN-RNN approach for visual emotion classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 1–17.

Digital Library

[34]

Qing Li, Xiaojiang Peng, Yu Qiao, and Qiang Peng. 2019. Learning category correlations for multi-label image recognition with graph networks. arXiv:1909.13005. Retrieved from https://arxiv.org/abs/1909.13005.

[35]

Zhixin Li, Lan Lin, Canlong Zhang, Huifang Ma, Weizhong Zhao, and Zhiping Shi. 2021. A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1 (2021), 1–23.

Digital Library

[36]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.

[37]

Luchen Liu, Sheng Guo, Weilin Huang, and Matthew R. Scott. 2019. Decoupling category-wise independence and relevance with self-attention for multi-label image classification. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1682–1686.

[38]

Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, and Heng Tao Shen. 2017. Compositional model based fisher vector coding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2335–2348.

[39]

Fan Lyu, Fuyuan Hu, Victor S. Sheng, Zhengtian Wu, Qiming Fu, and Baochuan Fu. 2018. Coarse to fine: Multi-label image classification with global/local attention. In Proceedings of the 2018 IEEE International Smart Cities Conference. IEEE, 1–7.

[40]

Fan Lyu, Qi Wu, Fuyuan Hu, Qingyao Wu, and Mingkui Tan. 2019. Attend and imagine: Multi-label image classification with visual attention and recurrent neural networks. IEEE Transactions on Multimedia 21, 8 (2019), 1971–1981.

[41]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579–2605.

[42]

Quanling Meng and Weigang Zhang. 2019. Multi-label image classification with attention mechanism and graph convolutional networks. In Proceedings of the ACM Multimedia Asia. 1–6.

Digital Library

[43]

Hoang D. Nguyen, Xuan-Son Vu, and Duc-Trong Le. 2021. Modular graph transformer networks for multi-label image classification. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35, 9092–9100.

[44]

Tao Pu, Lixian Yuan, Hefeng Wu, Tianshui Chen, Ling Tian, and Liang Lin. 2022. Semantic representation and dependency learning for multi-label image recognition. arXiv:2204.03795. Retrieved from https://arxiv.org/abs/2204.03795.

[45]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.

[46]

Dengdi Sun, Leilei Ma, Zhuanlian Ding, and Bin Luo. 2022. An attention-driven multi-label image classification with semantic embedding and graph convolutional networks. Cognitive Computation 9, 1 (2022), 1–12.

[47]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008.

[48]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv:1710.10903. Retrieved from https://arxiv.org/abs/1710.10903.

[49]

Xuan-Son Vu, Duc-Trong Le, Christoffer Edlund, Lili Jiang, and Hoang D. Nguyen. 2020. Privacy-preserving visual content tagging using graph transformer networks. In Proceedings of the 28th ACM International Conference on Multimedia. 2299–2307.

Digital Library

[50]

Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. CNN-RNN: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2285–2294.

[51]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.

[52]

Xiaomei Wang, Yaqian Li, Tong Luo, Yandong Guo, Yanwei Fu, and Xiangyang Xue. 2021. Distance restricted transformer encoder for multi-label classification. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo. IEEE, 1–6.

[53]

Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, and Shilei Wen. 2020. Multi-label classification with label graph superimposing. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34, 12265–12272.

[54]

Yangtao Wang, Yanzhao Xie, Yu Liu, Ke Zhou, and Xiaocui Li. 2020. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1575–1584.

Digital Library

[55]

Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, and Liang Lin. 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE International Conference on Computer Vision. 464–472.

[56]

Zhe Wang, Zhongli Fang, Dongdong Li, Hai Yang, and Wenli Du. 2021. Semantic supplementary network with prior information for multi-label image classification. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 1848–1859.

[57]

Xiangping Wu, Qingcai Chen, Wei Li, Yulun Xiao, and Baotian Hu. 2020. AdaHGNN: Adaptive hypergraph neural networks for multi-label image classification. In Proceedings of the 28th ACM International Conference on Multimedia. 284–293.

Digital Library

[58]

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500.

[59]

Zheng Yan, Weiwei Liu, Shiping Wen, and Yin Yang. 2019. Multi-label image classification by feature attention network. IEEE Access 7 (2019), 98005–98013.

[60]

Hao Yang, Joey Tianyi Zhou, Yu Zhang, Bin-Bin Gao, Jianxin Wu, and Jianfei Cai. 2016. Exploit bounding box annotations for multi-label object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 280–288.

[61]

Vacit Oguz Yazici, Abel Gonzalez-Garcia, Arnau Ramisa, Bartlomiej Twardowski, and Joost van de Weijer. 2020. Orderless recurrent models for multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13440–13449.

[62]

Jin Ye, Junjun He, Xiaojiang Peng, Wenhao Wu, and Yu Qiao. 2020. Attention-driven dynamic graph convolutional network for multi-label image recognition. In Proceedings of the 16th European Conference on Computer Vision. Springer, 649–665.

Digital Library

[63]

Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, and Shilei Wen. 2020. Cross-modality attention with semantic graph embedding for multi-label classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 12709–12716.

[64]

Wan-Jin Yu, Zhen-Duo Chen, Xin Luo, Wu Liu, and Xin-Shun Xu. 2019. DELTA: A deep dual-stream network for multi-label image classification. Pattern Recognition 91, C (2019), 322–331.

Digital Library

[65]

Kun Yuan, Shaopeng Guo, Ziwei Liu, Aojun Zhou, Fengwei Yu, and Wei Wu. 2021. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 579–588.

[66]

Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, and Jianfeng Lu. 2018. Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia 20, 10 (2018), 2801–2813.

[67]

Haiying Zhao, Wei Zhou, Xiaogang Hou, and Hui Zhu. 2020. Double attention for multi-label image classification. IEEE Access 8 (2020), 225539–225550.

[68]

Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, and Jia Li. 2021. Transformer-based dual relation graph for multi-label image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 163–172.

[69]

Lichen Zhao, Jinyang Guo, Dong Xu, and Lu Sheng. 2021. Transformer3D-Det: Improving 3D object detection by vote refinement. IEEE Transactions on Circuits and Systems for Video Technology 31, 12 (2021), 4735–4746.

[70]

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, and Li Zhang. 2020. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6881–6890.

[71]

Fengtao Zhou, Sheng Huang, Bo Liu, and Dan Yang. 2021. Multi-label image classification via category prototype compositional learning. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4513–4525.

[72]

Wei Zhou, Zhiwu Xia, Peng Dou, Tao Su, and Haifeng Hu. 2022. Double attention based on graph attention network for image multi-label classification. ACM Transactions on Multimedia Computing, Communications, and Applications (2022). Retrieved from

Digital Library

[73]

Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513–5522.

[74]

Ke Zhu and Jianxin Wu. 2021. Residual attention: A simple but effective method for multi-label recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 184–193.

Cited By

Zhou WLin KZheng ZChen DSu THu H(2025)DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classificationNeural Networks10.1016/j.neunet.2025.107309(107309)Online publication date: Mar-2025
https://doi.org/10.1016/j.neunet.2025.107309
Huang JWang DHong XQu XXue W(2024)Cross-modality semantic guidance for multi-label image classificationIntelligent Data Analysis10.3233/IDA-23023928:3(633-646)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.3233/IDA-230239
Ruan HXu ZYang ZLu YQin JChen T(2024)Learning Semantic-aware Representation in Visual-Language Models for Multi-label Recognition with Partial LabelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370899121:3(1-19)Online publication date: 23-Dec-2024
https://dl.acm.org/doi/10.1145/3708991
Show More Cited By

Index Terms

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
2. Networks
  1. Network architectures

Recommendations

Double Attention Based on Graph Attention Network for Image Multi-Label Classification
The task of image multi-label classification is to accurately recognize multiple objects in an input image. Most of the recent works need to leverage the label co-occurrence matrix counted from training data to construct the graph structure, which are ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Weak Labeled Multi-Label Active Learning for Image Classification
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

In order to achieve better classification performance with even fewer labeled images, active learning is suitable for these situations. Several active learning methods have been proposed for multi-label image classification, but all of them assume that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 2

March 2023

540 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3572860

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2023

Online AM: 21 July 2022

Accepted: 19 July 2022

Revised: 07 July 2022

Received: 28 February 2022

Published in TOMM Volume 19, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Science and Technology Program of Guangdong Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
593
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)15

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou WLin KZheng ZChen DSu THu H(2025)DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classificationNeural Networks10.1016/j.neunet.2025.107309(107309)Online publication date: Mar-2025
https://doi.org/10.1016/j.neunet.2025.107309
Huang JWang DHong XQu XXue W(2024)Cross-modality semantic guidance for multi-label image classificationIntelligent Data Analysis10.3233/IDA-23023928:3(633-646)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.3233/IDA-230239
Ruan HXu ZYang ZLu YQin JChen T(2024)Learning Semantic-aware Representation in Visual-Language Models for Multi-label Recognition with Partial LabelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370899121:3(1-19)Online publication date: 23-Dec-2024
https://dl.acm.org/doi/10.1145/3708991
Pan JLiu XBai YZhai DJiang JZhao D(2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664653
Li ZWang RZhu FHan JHu SGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image ClassificationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658005(740-748)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658005
Wu HWang ZLi YLiu XLee T(2024)Suitable and Style-Consistent Multi-Texture Recommendation for Cartoon IllustrationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365251820:7(1-26)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3652518
Jha MBhandari A(2024)NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank MinimizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877220:6(1-22)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3638772
Zhou WJiang WChen DHu HSu T(2024)Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.327727926(1143-1157)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3277279
Gui SWang ZChen JZhou XZhang CCao Y(2024)MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet RecognitionIEEE Transactions on Medical Imaging10.1109/TMI.2023.334573643:4(1628-1639)Online publication date: Apr-2024
https://doi.org/10.1109/TMI.2023.3345736
Zhang XHong WLi ZCheng XTang XZhou HJiao L(2024)Hierarchical Knowledge Graph for Multilabel Classification of Remote Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.347881762(1-14)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3478817
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents