Skip to main content
Log in

Improving multi-label classification using scene cues

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multi-label classification is one of the most challenging tasks in the computer vision community, owing to different composition and interaction (e.g. partial visibility or occlusion) between objects in multi-label images. Intuitively, some objects usually co-occur with some specific scenes, e.g. the sofa often appears in a living room. Therefore, the scene of a given image may provides informative cues for identifying those embedded objects. In this paper, we propose a novel scene-aware deep framework for addressing the challenging multi-label classification task. In particular, we incorporate two sub-networks that are pre-trained for different tasks (i.e. object classification and scene classification) into a unified framework, so that informative scene-aware cues can be leveraged for benefiting multi-label object classification. In addition, we also present a novel one vs. all multiple-cross-entropy (MCE) loss for optimizing the proposed scene-aware deep framework by independently penalizing the classification error for each label. The proposed method can be learned in an end-to-end manner and extensive experimental results on Pascal VOC 2007 and MS COCO demonstrate that our approach is able to make a noticeable improvement for the multi-label classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27

    Article  Google Scholar 

  2. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British machine vision conference

  3. Cheng MM, Zhang Z, Lin WY, Torr PHS (2014) BING: binarized normed gradients for objectness estimation at 300fps. In: Computer vision and pattern recognition

  4. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition, pp 248–255

  5. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531

  6. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A The PASCAL Visual Object Classes Challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  7. Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524

  8. Gong Y, Jia Y, leung TK, Toshev A, Ioffe S (2014) Deep convolutional ranking for multi label image annotation. In: International conference on learning representations

  9. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361

  10. Jia Y (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/

  11. Johnson J, Ballan L, Fei-Fei L (2015) Love thy neighbors: image annotation by exploiting image metadata. In: Proceedings of the IEEE international conference on computer vision, pp 4624–4632

  12. Kordumova S, Mensink T, Snoek CG (2016) Pooling objects for recognizing scenes without examples. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 143–150

  13. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Neural information processing systems, pp 1106–1114

  14. Lai H, Yan P, Shu X, Wei Y, Yan S (2016) Instance-aware hashing for multi-label image retrieval. IEEE Trans Image Process 25(6):2469–2479

    Article  MathSciNet  Google Scholar 

  15. Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv 49(1):14

    Article  Google Scholar 

  16. Liang X, Liu S, Wei Y, Liu L, Lin L, Yan S (2014) Computational baby learning. arXiv:1411.2861

  17. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755

  18. Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv:1411.4038

  19. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition, pp 1717–1724

  20. Oquab M, Bottou L, Laptev I, Sivic J (2014) Weakly supervised object recognition with convolutional neural networks. Tech. Rep. HAL-01015140, INRIA

  21. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: IEEE conference on computer vision and pattern recognition, pp 413–420

  22. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. arXiv:1403.6382

  23. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229

  24. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  25. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842

  26. Verma Y, Jawahar C (2017) A support vector approach for cross-modal search of images and texts. Comput Vis Image Underst 154:48–63

    Article  Google Scholar 

  27. Wang L, Wang Z, Du W, Qiao Y (2015) Object-scene convolutional neural networks for event recognition in images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–35

  28. Wei Y, Liang X, Chen Y, Shen X, Cheng MM, Zhao Y, Yan S (2015) Stc: a simple to complex framework for weakly-supervised semantic segmentation. arXiv:1509.03150

  29. Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans Pattern Recognit Mach Intell 38(9):1901–1907

    Article  Google Scholar 

  30. Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2016) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47 (2):449–460

    Google Scholar 

  31. Xiao J, Hays J, Ehinger K, Oliva A, Torralba A et al (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492

  32. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

  33. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (No.61272353, 61370128, 61428201), Program for New Century Excellent Talents in University (NCET-13-0659), Scientic and Technological Research of Shandong, China (NO.2016GGX101029).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhao Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Lu, W., Sun, Z. et al. Improving multi-label classification using scene cues. Multimed Tools Appl 77, 6079–6094 (2018). https://doi.org/10.1007/s11042-017-4517-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4517-0

Keywords

Navigation