skip to main content
10.1145/3338533.3366589acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-Label Image Classification with Attention Mechanism and Graph Convolutional Networks

Published:10 January 2020Publication History

ABSTRACT

The task of multi-label image classification is to predict a set of proper labels for an input image. To this end, it is necessary to strengthen the association between the labels and the image regions, and utilize the relationship between the labels. In this paper, we propose a novel framework for multi-label image classification, which uses attention mechanism and Graph Convolutional Network (GCN) simultaneously. The attention mechanism can focus on specific target regions while ignoring other useless information around, thereby enhancing the association of the labels with the image regions. By constructing a directed graph over the labels, GCN can learn the relationship between the labels from a global perspective and map this label graph to a set of inter-dependent object classifiers. The framework first uses ResNet to extract features while using attention mechanism to generate attention maps for all labels and obtain weighted features. GCN uses weighted fusion features from the output of the resnet and attention mechanism to achieve classification. Experimental results show that both the attention mechanism and GCN can effectively improve the classification performance, and the proposed framework is competitive with the state-of-the-art methods.

References

  1. Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014).Google ScholarGoogle Scholar
  2. Loris Bazzani, Hugo Larochelle, Vittorio Murino, Jo-anne Ting, and Nando D Freitas. 2011. Learning attentional policies for tracking and recognition in video with deep networks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 937--944.Google ScholarGoogle Scholar
  3. Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014).Google ScholarGoogle Scholar
  4. Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, and Yu-Chiang Frank Wang. 2018. Order-free RNN with visual attention for multi-label classification. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  5. Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-Label Image Recognition with Graph Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177--5186.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jian Dong, Wei Xia, Qiang Chen, Jianshi Feng, Zhongyang Huang, and Shuicheng Yan. 2013. Subcategory-aware object classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 827--834.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Thibaut Durand, Nazanin Mehrasa, and Greg Mori. 2019. Learning a Deep ConvNet for Multi-label Classification with Partial Labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 647--657.Google ScholarGoogle ScholarCross RefCross Ref
  9. Weifeng Ge, Sibei Yang, and Yizhou Yu. 2018. Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1277--1286.Google ScholarGoogle ScholarCross RefCross Ref
  10. Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013).Google ScholarGoogle Scholar
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao, and Greg Mori. 2016. Learning structured inference neural networks with label relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2960--2968.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.Google ScholarGoogle ScholarCross RefCross Ref
  15. Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. In Advances in neural information processing systems. 2017--2025.Google ScholarGoogle Scholar
  16. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  17. Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, and Yu-Chiang Frank Wang. 2018. Multi-label zero-shot learning with structured knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1576--1585.Google ScholarGoogle ScholarCross RefCross Ref
  18. Qiang Li, Maoying Qiao, Wei Bian, and Dacheng Tao. 2016. Conditional graphical lasso for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2977--2986.Google ScholarGoogle ScholarCross RefCross Ref
  19. Xin Li, Feipeng Zhao, and Yuhong Guo. 2014. Multi-label Image Classification with A Probabilistic Label Enhancement Model.. In UAI, Vol. 1. 3.Google ScholarGoogle Scholar
  20. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarGoogle ScholarCross RefCross Ref
  21. Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems. 2204--2212.Google ScholarGoogle Scholar
  22. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  25. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.Google ScholarGoogle ScholarCross RefCross Ref
  26. Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2285--2294.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, and Liang Lin. 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE international conference on computer vision. 464--472.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yunchao Wei, Wei Xia, Min Lin, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, and Shuicheng Yan. 2015. HCP: A flexible CNN framework for multi-label image classification. IEEE transactions on pattern analysis and machine intelligence 38, 9 (2015), 1901--1907.Google ScholarGoogle Scholar
  30. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 21--29.Google ScholarGoogle ScholarCross RefCross Ref
  32. Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513--5522.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-Label Image Classification with Attention Mechanism and Graph Convolutional Networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia
          December 2019
          403 pages
          ISBN:9781450368414
          DOI:10.1145/3338533

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 January 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          MMAsia '19 Paper Acceptance Rate59of204submissions,29%Overall Acceptance Rate59of204submissions,29%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader