research-article

Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification

Authors:

Shuqiang Jiang,

Qingming HuangAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1092 - 1100

https://doi.org/10.1145/3240508.3240649

Published: 15 October 2018 Publication History

Abstract

Multi-label image classification is a fundamental and challenging task in computer vision, and recently achieved significant progress by exploiting semantic relations among labels. However, the spatial positions of labels for multi-labels images are usually not provided in real scenarios, which brings insuperable barrier to conventional models. In this paper, we propose an end-to-end attentive recurrent neural network for multi-label image classification under only image-level supervision, which learns the discriminative feature representations and models the label relations simultaneously. First, inspired by attention mechanism, we propose a recurrent highlight network (RHN) which focuses on the most related regions in the image to learn the discriminative feature representations for different objects in an iterative manner. Second, we develop a gated recurrent relation extractor (GRRE) to model the label relations using multiplicative gates in a recurrent fashion, which learns to decide how multiple labels of the image influence the relation extraction. Extensive experiments on three benchmark datasets show that our model outperforms the state-of-the-arts, and performs better on small-object categories and under the scenario with large number of labels.

References

[1]

Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014).

[2]

Xiao Cai, Feiping Nie, Weidong Cai, and Heng Huang. 2013. New graph structured sparsity model for multi-label image annotations. In ICCV . 801--808.

Digital Library

[3]

Xiaochun Cao, Hua Zhang, Xiaojie Guo, Si Liu, and Dan Meng. 2015. SLED: Semantic Label Embedding Dictionary Representation for Multilabel Image Annotation. TIP, Vol. 24, 9 (2015), 2746--2759.

[4]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In ACM international conference on image and video retrieval. 48.

Digital Library

[5]

Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop .

[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255.

[7]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR . 580--587.

Digital Library

[8]

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013).

[9]

Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV. IEEE, 309--316.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).

[11]

Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao, and Greg Mori. 2016. Learning structured inference neural networks with label relations. In CVPR . 2960--2968.

[12]

Qinghao Hu, Jiaxiang Wu, Jian Cheng, Lifang Wu, and Hanqing Lu. 2017. Pseudo Label based Unsupervised Deep Discriminative Hashing for Image Retrieval. In ACM Multimedia .

Digital Library

[13]

Yunho Jeon and Junmo Kim. 2017. Active Convolution: Learning the Shape of Convolution for Image Classification. In CVPR .

[14]

Jiren Jin and Hideki Nakayama. 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. arXiv preprint arXiv:1604.05225 (2016).

[15]

Mahdi M Kalayeh, Haroon Idrees, and Mubarak Shah. 2014. Nmf-knn: Image annotation using weighted multi-view non-negative matrix factorization. In CVPR . IEEE, 184--191.

Digital Library

[16]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[17]

Piotr Koniusz, Fei Yan, Philippe-Henri Gosselin, and Krystian Mikolajczyk. 2017. Higher-Order Occurrence Pooling for Bags-of-Words: Visual Concept Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 2 (2017), 313--327.

Digital Library

[18]

Maksim Lapin, Matthias Hein, and Bernt Schiele. 2018. Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 7 (2018), 1533--1554.

[19]

Qiang Li, Maoying Qiao, Wei Bian, and Dacheng Tao. 2016. Conditional graphical lasso for multi-label image classification. In CVPR . 2977--2986.

[20]

Yunsheng Li, Mandar Dixit, and Nuno Vasconcelos. 2017a. Deep Scene Image Classification With the MFAFVNet. In ICCV .

[21]

Yuncheng Li, Yale Song, and Jiebo Luo. 2017b. Improving Pairwise Ranking for Multi-label Image Classification. In CVPR .

[22]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.

[23]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems 27. 2204--2212.

Digital Library

[24]

Venkatesh N Murthy, Subhransu Maji, and R Manmatha. 2015. Automatic image annotation using deep learning representations. In ICMR. ACM, 603--606.

Digital Library

[25]

Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR. 1717--1724.

Digital Library

[26]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic. 2015. Is object localization for free? -- Weakly-supervised learning with convolutional neural networks. In CVPR .

[27]

Duangmanee Putthividhy, Hagai T Attias, and Srikantan S Nagarajan. 2010. Topic regression multi-modal latent dirichlet allocation for image annotation. In CVPR. IEEE, 3408--3415.

[28]

Ronald A Rensink. 2000. The dynamic representation of scenes. Visual cognition, Vol. 7, 1--3 (2000), 17--42.

[29]

Robin Senge, Juan José Del Coz, and Eyke Hüllermeier. 2014. On the problem of error propagation in classifier chains for multi-label classification. In Data Analysis, Machine Learning and Knowledge Discovery. Springer, 163--170.

[30]

Weiwei Shi, Yihong Gong, Xiaoyu Tao, and Nanning Zheng. 2017. Training DCNN by Combining Max-Margin, Max-Correlation Objectives, and Correntropy Loss for Multilabel Image Classification. IEEE TNNLS (2017).

[31]

Fuming Sun, Jinhui Tang, Haojie Li, Guo-Jun Qi, and Thomas S Huang. 2014. Multi-label image categorization with sparse factor representation. TIP, Vol. 23, 3 (2014), 1028--1037.

Digital Library

[32]

Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et almbox. 1999. Policy gradient methods for reinforcement learning with function approximation. In NIPS, Vol. 99. 1057--1063.

Digital Library

[33]

Tiberio Uricchio, Marco Bertini, Lorenzo Seidenari, and Alberto Bimbo. 2015. Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging. In ICCV Workshops. 9--15.

Digital Library

[34]

Yashaswi Verma and CV Jawahar. 2012. Image annotation using metric learning in semantic neighbourhoods. In ECCV. Springer, 836--849.

Digital Library

[35]

Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 319--326.

Digital Library

[36]

Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. CNN-RNN: A Unified Framework for Multi-label Image Classification. arXiv preprint arXiv:1604.04573 (2016).

[37]

Yunchao Wei, Wei Xia, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, and Shuicheng Yan. 2014. CNN: Single-label to multi-label. arXiv preprint arXiv:1406.5726 (2014).

[38]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.

Digital Library

[39]

Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, and Zheng Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR. 842--850.

[40]

Pengtao Xie, Ruslan Salakhutdinov, Luntian Mou, and Eric P. Xing. 2017. Deep Determinantal Point Process for Large-Scale Multi-Label Classification. In ICCV .

[41]

Xiangyang Xue, Wei Zhang, Jie Zhang, Bin Wu, Jianping Fan, and Yao Lu. 2011. Correlative multi-label multi-instance image annotation. In ICCV. IEEE, 651--658.

Digital Library

[42]

Geng Yan, Yang Wang, and Zicheng Liao. 2016. LSTM for Image Annotation with Relative Visual Importance. In BMVC .

[43]

Hao Yang, Joey Tianyi Zhou, Yu Zhang, Bin-Bin Gao, Jianxin Wu, and Jianfei Cai. 2016b. Exploit bounding box annotations for multi-label object recognition. In CVPR . 280--288.

[44]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016a. Stacked attention networks for image question answering. In CVPR. 21--29.

[45]

Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. arXiv preprint arXiv:1603.03925 (2016).

[46]

Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).

[47]

Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. 2010. Deconvolutional networks. In CVPR. IEEE, 2528--2535.

[48]

Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. In CVPR .

Cited By

Wang XXuan XXu QCai HShen W(2025)Semantic Abstractions for Multi-label ClassificationArtificial Intelligence Logic and Applications10.1007/978-981-96-0354-1_12(143-151)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0354-1_12
Zhang JLi LYan CWang ZXu CZhang JChen C(2024)Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation AdaptationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239720:9(1-23)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672397
Yin H(2024)TFAD: An Image Multi-Label Recognition Method with Image-Text Powered Attention2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650309(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650309
Show More Cited By

Index Terms

Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

Weak Labeled Multi-Label Active Learning for Image Classification
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

In order to achieve better classification performance with even fewer labeled images, active learning is suitable for these situations. Several active learning methods have been proposed for multi-label image classification, but all of them assume that ...
Multi-Label Active Learning with Chi-Square Statistics for Image Classification
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Active learning is to select the most informative examples to request their labels. Most previous studies in active learning for multi-label classification didn't pay enough attention on label correlations. This leads to a bad performance for ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Basic Research Program of China
Key Research Program of Frontier Sciences

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
452
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang XXuan XXu QCai HShen W(2025)Semantic Abstractions for Multi-label ClassificationArtificial Intelligence Logic and Applications10.1007/978-981-96-0354-1_12(143-151)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0354-1_12
Zhang JLi LYan CWang ZXu CZhang JChen C(2024)Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation AdaptationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239720:9(1-23)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672397
Yin H(2024)TFAD: An Image Multi-Label Recognition Method with Image-Text Powered Attention2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650309(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650309
Ni YCong YZhao CYu JWang YZhou GShen M(2024)Active learning based on multi-enhanced views for classification of multiple patterns in lung ultrasound imagesComputerized Medical Imaging and Graphics10.1016/j.compmedimag.2024.102454118(102454)Online publication date: Dec-2024
https://doi.org/10.1016/j.compmedimag.2024.102454
Liang KWang XZhang HMa ZGuo JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Hierarchical Visual Attribute Learning in the WildProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612274(3415-3423)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612274
Yuan JChen SZhang YShi ZGeng XFan JRui Y(2023)Graph Attention Transformer Network for Multi-label Image ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357851819:4(1-16)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3578518
Feng GHu ZZhang LSun JLu H(2023)Bidirectional Relationship Inferring Network for Referring Image Localization and SegmentationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.310615334:5(2246-2258)Online publication date: May-2023
https://doi.org/10.1109/TNNLS.2021.3106153
Zhou MLan XWei XLiao XMao QLi YWu CXiang TFang B(2023)An End-to-End Blind Image Quality Assessment Method Using a Recurrent Network and Self-AttentionIEEE Transactions on Broadcasting10.1109/TBC.2022.321524969:2(369-377)Online publication date: Jun-2023
https://doi.org/10.1109/TBC.2022.3215249
Tan MYuan FYu JWang GGu X(2022)Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic PoolingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349222118:1s(1-23)Online publication date: 25-Jan-2022
https://dl.acm.org/doi/10.1145/3492221
Wang ZFang ZLi DYang HDu W(2022)Semantic Supplementary Network With Prior Information for Multi-Label Image ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.308397832:4(1848-1859)Online publication date: Apr-2022
https://doi.org/10.1109/TCSVT.2021.3083978
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten