Improving multi-label classification using scene cues

Li, Zhao; Lu, Wei; Sun, Zhanquan; Xing, Weiwei

doi:10.1007/s11042-017-4517-0

Improving multi-label classification using scene cues

Published: 08 March 2017

Volume 77, pages 6079–6094, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhao Li^1,2,
Wei Lu¹,
Zhanquan Sun² &
…
Weiwei Xing¹

4553 Accesses
3 Citations
Explore all metrics

Abstract

Multi-label classification is one of the most challenging tasks in the computer vision community, owing to different composition and interaction (e.g. partial visibility or occlusion) between objects in multi-label images. Intuitively, some objects usually co-occur with some specific scenes, e.g. the sofa often appears in a living room. Therefore, the scene of a given image may provides informative cues for identifying those embedded objects. In this paper, we propose a novel scene-aware deep framework for addressing the challenging multi-label classification task. In particular, we incorporate two sub-networks that are pre-trained for different tasks (i.e. object classification and scene classification) into a unified framework, so that informative scene-aware cues can be leveraged for benefiting multi-label object classification. In addition, we also present a novel one vs. all multiple-cross-entropy (MCE) loss for optimizing the proposed scene-aware deep framework by independently penalizing the classification error for each label. The proposed method can be learned in an end-to-end manner and extensive experimental results on Pascal VOC 2007 and MS COCO demonstrate that our approach is able to make a noticeable improvement for the multi-label classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Article 09 January 2022

Dengdi Sun, Leilei Ma, … Bin Luo

Semantic Contrastive Bootstrapping for Single-Positive Multi-label Recognition

Article 13 August 2023

Cheng Chen, Yifan Zhao & Jia Li

A Study: Multiple-Label Image Classification Using Deep Convolutional Neural Network Architectures

References

Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British machine vision conference
Cheng MM, Zhang Z, Lin WY, Torr PHS (2014) BING: binarized normed gradients for objectness estimation at 300fps. In: Computer vision and pattern recognition
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition, pp 248–255
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A The PASCAL Visual Object Classes Challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524
Gong Y, Jia Y, leung TK, Toshev A, Ioffe S (2014) Deep convolutional ranking for multi label image annotation. In: International conference on learning representations
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361
Jia Y (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/
Johnson J, Ballan L, Fei-Fei L (2015) Love thy neighbors: image annotation by exploiting image metadata. In: Proceedings of the IEEE international conference on computer vision, pp 4624–4632
Kordumova S, Mensink T, Snoek CG (2016) Pooling objects for recognizing scenes without examples. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 143–150
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Neural information processing systems, pp 1106–1114
Lai H, Yan P, Shu X, Wei Y, Yan S (2016) Instance-aware hashing for multi-label image retrieval. IEEE Trans Image Process 25(6):2469–2479
Article MathSciNet Google Scholar
Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv 49(1):14
Article Google Scholar
Liang X, Liu S, Wei Y, Liu L, Lin L, Yan S (2014) Computational baby learning. arXiv:1411.2861
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv:1411.4038
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition, pp 1717–1724
Oquab M, Bottou L, Laptev I, Sivic J (2014) Weakly supervised object recognition with convolutional neural networks. Tech. Rep. HAL-01015140, INRIA
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: IEEE conference on computer vision and pattern recognition, pp 413–420
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. arXiv:1403.6382
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842
Verma Y, Jawahar C (2017) A support vector approach for cross-modal search of images and texts. Comput Vis Image Underst 154:48–63
Article Google Scholar
Wang L, Wang Z, Du W, Qiao Y (2015) Object-scene convolutional neural networks for event recognition in images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–35
Wei Y, Liang X, Chen Y, Shen X, Cheng MM, Zhao Y, Yan S (2015) Stc: a simple to complex framework for weakly-supervised semantic segmentation. arXiv:1509.03150
Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Recognit Mach Intell 38(9):1901–1907
Article Google Scholar
Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2016) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47 (2):449–460
Google Scholar
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A et al (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (No.61272353, 61370128, 61428201), Program for New Century Excellent Talents in University (NCET-13-0659), Scientic and Technological Research of Shandong, China (NO.2016GGX101029).

Author information

Authors and Affiliations

School of Software Engineering, Beijing Jiaotong University, Beijing, China
Zhao Li, Wei Lu & Weiwei Xing
Shandong Computer Science Center (National Supercomputer Center in Jinan), Shandong Provincial Key Laboratory of Computer Networks, Jinan, China
Zhao Li & Zhanquan Sun

Authors

Zhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhanquan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Xing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Lu, W., Sun, Z. et al. Improving multi-label classification using scene cues. Multimed Tools Appl 77, 6079–6094 (2018). https://doi.org/10.1007/s11042-017-4517-0

Download citation

Received: 31 March 2016
Revised: 13 February 2017
Accepted: 16 February 2017
Published: 08 March 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11042-017-4517-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving multi-label classification using scene cues

Abstract

Access this article

Similar content being viewed by others

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Semantic Contrastive Bootstrapping for Single-Positive Multi-label Recognition

A Study: Multiple-Label Image Classification Using Deep Convolutional Neural Network Architectures

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving multi-label classification using scene cues

Abstract

Access this article

Similar content being viewed by others

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Semantic Contrastive Bootstrapping for Single-Positive Multi-label Recognition

A Study: Multiple-Label Image Classification Using Deep Convolutional Neural Network Architectures

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation