research-article

Attention-Augmented Memory Network for Image Multi-Label Classification

Authors:

Tao SuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 3

Article No.: 116, Pages 1 - 24

https://doi.org/10.1145/3570166

Published: 25 February 2023 Publication History

Abstract

The purpose of image multi-label classification is to predict all the object categories presented in an image. Some recent works exploit graph convolution network to capture the correlation between labels. Although promising results have been reported, these methods cannot learn salient object features in the images and ignore the correlation between channel feature maps. In addition, the current researches only learn the feature information within individual input image, but fail to mine the contextual information of various categories from the dataset to enhance the input feature representation. To address these issues, we propose an Attention-Augmented Memory Network (AAMN) model for the image multi-label classification task. Specifically, we first propose a novel categorical memory module to excavate the contextual information of various categories from the dataset to augment the current input feature. Secondly, we design a new channel-relation exploration module to capture the inter-channel relationship of features, so as to enhance the correlation between objects in the images. Thirdly, we develop a spatial-relation enhancement module to model second-order statistics of features and capture long-range dependencies between pixels in feature maps, so as to learn salient object features. Experimental results on standard benchmarks, including MS-COCO 2014, PASCAL VOC 2007, and VG-500, demonstrate the effectiveness and superiority of AAMN model, which outperforms current state-of-the-art methods.

References

[1]

Inigo Alonso, Alberto Sabater, David Ferstl, Luis Montesano, and Ana C. Murillo. 2021. Semi-supervised semantic segmentation with pixel-level contrastive learning from a class-wise memory bank. arXiv preprint arXiv:2104.13415 (2021).

[2]

Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–10.

[3]

Joao Carreira, Rui Caseiro, Jorge Batista, and Cristian Sminchisescu. 2012. Semantic segmentation with second-order pooling. In European Conference on Computer Vision. Springer, 430–443.

Digital Library

[4]

Tianshui Chen, Liang Lin, Xiaolu Hui, Riquan Chen, and Hefeng Wu. 2020. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[5]

Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, and Liang Lin. 2019. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 522–531.

[6]

Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177–5186.

[7]

Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, and Honglin Liu. 2021. MlTr: Multi-label classification with transformer. arXiv preprint arXiv:2106.06195 (2021).

[8]

Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, and Haibing Guan. 2019. Object guided external memory network for video object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6678–6687.

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.

[10]

Weijian Deng, Joshua Marsh, Stephen Gould, and Liang Zheng. 2020. Fine-grained classification via categorical memory networks. arXiv preprint arXiv:2012.06793 (2020).

[11]

Thibaut Durand, Taylor Mordan, Nicolas Thome, and Matthieu Cord. 2017. Wildcat: Weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 642–651.

[12]

Thibaut Durand, Nicolas Thome, and Matthieu Cord. 2018. Exploiting negative evidence for deep latent structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 337–351.

Digital Library

[13]

Ayushi Dutta, Yashaswi Verma, and C. V. Jawahar. 2020. Recurrent image annotation with explicit inter-label dependencies. In European Conference on Computer Vision. Springer, 191–207.

Digital Library

[14]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.

Digital Library

[15]

Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3146–3154.

[16]

Bin-Bin Gao and Hong-Yu Zhou. 2021. Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing 30 (2021), 5920–5932.

Digital Library

[17]

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013).

[18]

Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, and Song Wang. 2019. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 729–739.

[19]

Shikha Gupta, Krishan Sharma, Dileep Aroor Dinesh, and Veena Thenkanidiyoor. 2021. Visual semantic-based representation learning using deep CNNs for scene recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 2 (2021), 1–24.

Digital Library

[20]

Mohammed Hassanin, Ibrahim Radwan, Salman Khan, and Murat Tahtali. 2022. Learning discriminative representations for multi-label image recognition. Journal of Visual Communication and Image Representation 83 (2022), 103448.

Digital Library

[21]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[22]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.

[23]

Somi Jeong, Youngjung Kim, Eungbean Lee, and Kwanghoon Sohn. 2021. Memory-guided unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6558–6567.

[24]

Wanting Ji and Ruili Wang. 2021. A multi-instance multi-label dual learning approach for video captioning. ACM Transactions on Multimedia Computing Communications and Applications 17, 2s (2021), 1–18.

Digital Library

[25]

Jiren Jin and Hideki Nakayama. 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2452–2457.

[26]

Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[27]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, et al. 2016. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332 (2016).

[28]

Jack Lanchantin, Tianlu Wang, Vicente Ordonez, and Yanjun Qi. 2021. General multi-label image classification with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16478–16488.

[29]

Hila Levi and Shimon Ullman. 2018. Efficient coarse-to-fine non-local module for the detection of small objects. arXiv preprint arXiv:1811.12152 (2018).

[30]

Junbing Li, Changqing Zhang, Xueman Wang, and Ling Du. 2020. Multi-scale cross-modal spatial attention fusion for multi-label image recognition. In International Conference on Artificial Neural Networks. Springer, 736–747.

Digital Library

[31]

Qing Li, Xiaojiang Peng, Yu Qiao, and Qiang Peng. 2019. Learning category correlations for multi-label image recognition with graph networks. arXiv preprint arXiv:1909.13005 (2019).

[32]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, 740–755.

[33]

Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2017. Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (2017), 1309–1322.

[34]

Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. 2021. Query2Label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834 (2021).

[35]

Fan Lyu, Qi Wu, Fuyuan Hu, Qingyao Wu, and Mingkui Tan. 2019. Attend and imagine: Multi-label image classification with visual attention and recurrent neural networks. IEEE Transactions on Multimedia 21, 8 (2019), 1971–1981.

[36]

Quanling Meng and Weigang Zhang. 2019. Multi-label image classification with attention mechanism and graph convolutional networks. In Proceedings of the ACM Multimedia Asia. 1–6.

Digital Library

[37]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International Conference on Data Mining. IEEE, 995–1000.

Digital Library

[38]

Tal Ridnik, Emanuel Ben-Baruch, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. 2021. Asymmetric loss for multi-label classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 82–91.

[39]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.

[40]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[41]

Inder Pal Singh, Oyebade Oyedotun, Enjie Ghorbel, and Djamila Aouada. 2022. IML-GCN: Improved multi-label graph convolutional network for efficient yet precise image classification. In AAAI-22 Workshop Program-Deep Learning on Graphs: Methods and Applications.

[42]

Dengdi Sun, Leilei Ma, Zhuanlian Ding, and Bin Luo. 2022. An attention-driven multi-label image classification with semantic embedding and graph convolutional networks. Cognitive Computation (2022), 1–12.

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.

[44]

Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. CNN-RNN: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2285–2294.

[45]

Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11534–11542.

[46]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.

[47]

Xiaomei Wang, Yaqian Li, Tong Luo, Yandong Guo, Yanwei Fu, and Xiangyang Xue. 2021. Distance restricted transformer encoder for multi-label classification. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.

[48]

Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, and Shilei Wen. 2020. Multi-label classification with label graph superimposing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12265–12272.

[49]

Yangtao Wang, Yanzhao Xie, Yu Liu, Ke Zhou, and Xiaocui Li. 2020. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1575–1584.

Digital Library

[50]

Zhe Wang, Zhongli Fang, Dongdong Li, Hai Yang, and Wenli Du. 2021. Semantic supplementary network with prior information for multi-label image classification. IEEE Transactions on Circuits and Systems for Video Technology (2021).

[51]

Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916 (2014).

[52]

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV). 3–19.

Digital Library

[53]

Xiangping Wu, Qingcai Chen, Wei Li, Yulun Xiao, and Baotian Hu. 2020. AdaHGNN: Adaptive hypergraph neural networks for multi-label image classification. In Proceedings of the 28th ACM International Conference on Multimedia. 284–293.

Digital Library

[54]

Zheng Yan, Weiwei Liu, Shiping Wen, and Yin Yang. 2019. Multi-label image classification by feature attention network. IEEE Access 7 (2019), 98005–98013.

[55]

Vacit Oguz Yazici, Abel Gonzalez-Garcia, Arnau Ramisa, Bartlomiej Twardowski, and Joost van de Weijer. 2020. Orderless recurrent models for multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13440–13449.

[56]

Chaojian Yu, Xinyi Zhao, Qi Zheng, Peng Zhang, and Xinge You. 2018. Hierarchical bilinear pooling for fine-grained visual recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 574–589.

Digital Library

[57]

Wan-Jin Yu, Zhen-Duo Chen, Xin Luo, Wu Liu, and Xin-Shun Xu. 2019. DELTA: A deep dual-stream network for multi-label image classification. Pattern Recognition 91 (2019), 322–331.

Digital Library

[58]

Zhou Yu, Jun Yu, Jianping Fan, and Dacheng Tao. 2017. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision. 1821–1830.

[59]

Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, and Fuxin Xu. 2018. Compact generalized non-local network. arXiv preprint arXiv:1810.13125 (2018).

[60]

Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818–833.

[61]

Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, and Jianfeng Lu. 2018. Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia 20, 10 (2018), 2801–2813.

[62]

Kaihua Zhang, Tengpeng Li, Shiwen Shen, Bo Liu, Jin Chen, and Qingshan Liu. 2020. Adaptive graph convolutional network with attention graph clustering for co-saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9050–9059.

[63]

Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. RSTNet: Captioning with adaptive attention on visual and non-visual words. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15465–15474.

[64]

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Xin Jin, and Zhibo Chen. 2019. Relation-aware global attention. arXiv preprint arXiv:1904.02998 (2019).

[65]

Haiying Zhao, Wei Zhou, Xiaogang Hou, and Hui Zhu. 2020. Double attention for multi-label image classification. IEEE Access 8 (2020), 225539–225550.

[66]

Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, and Jiebo Luo. 2021. Memory enhanced embedding learning for cross-modal video-text retrieval. arXiv preprint arXiv:2103.15686 (2021).

[67]

Fengtao Zhou, Sheng Huang, Bo Liu, and Dan Yang. 2021. Multi-label image classification via category prototype compositional learning. IEEE Transactions on Circuits and Systems for Video Technology (2021).

[68]

Fengtao Zhou, Sheng Huang, and Yun Xing. 2020. Deep semantic dictionary learning for multi-label image classification. arXiv preprint arXiv:2012.12509 (2020).

[69]

Wei Zhou, Zhiwu Xia, Peng Dou, Tao Su, and Haifeng Hu. 2022. Double attention based on graph attention network for image multi-label classification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) (2022).

Digital Library

[70]

Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513–5522.

[71]

Ke Zhu and Jianxin Wu. 2021. Residual attention: A simple but effective method for multi-label recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 184–193.

[72]

Lei Zhu, Qi She, Duo Li, Yanye Lu, Xuejing Kang, Jie Hu, and Changhu Wang. 2021. Unifying nonlocal blocks for neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12292–12301.

[73]

Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, and Jifeng Dai. 2019. An empirical study of spatial attention mechanisms in deep networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6688–6697.

[74]

Gao Zilin, Xie Jiangtao, Wang Qilong, and Li Peihua. 2019. Global second-order pooling convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, 16–20.

Cited By

Tang QLiu CLiu FJiang JZhang BPhilip Chen CHan KWang Y(2025)Rethinking Feature Reconstruction via Category Prototype in Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2025.353453234(1036-1047)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TIP.2025.3534532
Zhou WJiang WZheng ZLi JSu THu H(2025)From grids to pseudo-regions: Dynamic memory augmented image captioning with dual relation transformerExpert Systems with Applications10.1016/j.eswa.2025.126850273(126850)Online publication date: May-2025
https://doi.org/10.1016/j.eswa.2025.126850
Huang JWang DHong XQu XXue W(2024)Cross-modality semantic guidance for multi-label image classificationIntelligent Data Analysis10.3233/IDA-23023928:3(633-646)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.3233/IDA-230239
Show More Cited By

Index Terms

Attention-Augmented Memory Network for Image Multi-Label Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
2. Networks
  1. Network architectures

Recommendations

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification
Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability ...
Double Attention Based on Graph Attention Network for Image Multi-Label Classification
The task of image multi-label classification is to accurately recognize multiple objects in an input image. Most of the recent works need to leverage the label co-occurrence matrix counted from training data to construct the graph structure, which are ...
Feature learning network with transformer for multi-label image classification
Highlights
- A novel framework termed FL-Tran is proposed to solve the multi-label image classification task.
- A multi-scale fusion mechanism is designed to align high-level features and low-level features to learn multi-scale features.
- A ...
Abstract
The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 3

May 2023

514 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3582886

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2023

Online AM: 03 November 2022

Accepted: 27 October 2022

Revised: 26 July 2022

Received: 10 April 2022

Published in TOMM Volume 19, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Science and Technology Program of Guangdong Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
430
Total Downloads

Downloads (Last 12 months)132
Downloads (Last 6 weeks)12

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang QLiu CLiu FJiang JZhang BPhilip Chen CHan KWang Y(2025)Rethinking Feature Reconstruction via Category Prototype in Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2025.353453234(1036-1047)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TIP.2025.3534532
Zhou WJiang WZheng ZLi JSu THu H(2025)From grids to pseudo-regions: Dynamic memory augmented image captioning with dual relation transformerExpert Systems with Applications10.1016/j.eswa.2025.126850273(126850)Online publication date: May-2025
https://doi.org/10.1016/j.eswa.2025.126850
Huang JWang DHong XQu XXue W(2024)Cross-modality semantic guidance for multi-label image classificationIntelligent Data Analysis10.3233/IDA-23023928:3(633-646)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.3233/IDA-230239
Ruan HXu ZYang ZLu YQin JChen T(2024)Learning Semantic-aware Representation in Visual-Language Models for Multi-label Recognition with Partial LabelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370899121:3(1-19)Online publication date: 23-Dec-2024
https://dl.acm.org/doi/10.1145/3708991
Pan JLiu XBai YZhai DJiang JZhao D(2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664653
Li ZWang RZhu FHan JHu SGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image ClassificationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658005(740-748)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658005
Wu HWang ZLi YLiu XLee T(2024)Suitable and Style-Consistent Multi-Texture Recommendation for Cartoon IllustrationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365251820:7(1-26)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3652518
Jha MBhandari A(2024)NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank MinimizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877220:6(1-22)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3638772
Zhou WJiang WChen DHu HSu T(2024)Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.327727926(1143-1157)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3277279
Wu XFeng YXu HLin ZChen TLi SQiu SLiu QMa YZhang S(2024)CTransCNNKnowledge-Based Systems10.1016/j.knosys.2023.111030281:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111030
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents