Skip to main content
Log in

Self-weighted discriminative metric learning based on deep features for scene recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep features have achieved impressive performance for image recognition. However, directly using the pre-trained deep models on scene images is not proper, because of intra-class diversity and inter-class similarity of the scene images. In this paper, based on deep features, a self-weighted discriminative metric learning (SDML) method is proposed to deal with the scene recognition. Specifically, biobjective discriminative metric function is first defined to fully exploit discriminative information from deep features. Second, a self-weighted coefficient is integrated into the biobjective function with the aim of adaptively balancing the discriminative distance between the same class and different classes. Moreover, a nuclear norm regularization is introduced to promote low rank metric. An alternating iterative optimization strategy is devised to solve the proposed method effectively. Finally, the SDML method can be extended to its kernelized version. Experimental results on three scene datasets demonstrate the superiority of our methods over several existing metric learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  2. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27

    Article  Google Scholar 

  3. Cheng X, Lu J, Feng J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recogn 74:474–487

    Article  Google Scholar 

  4. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Workshop on Stat Learn Comput Vision Eccv 44(247):1–22

    Google Scholar 

  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer society conference on computer vision and pattern recognition, pp 886–893

  6. Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighbourhood components analysis. In: Proceedings of the 17th international conference on neural information processing systems, pp 513– 520

  7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778

  8. Herranz L, Jiang S, Li X (2016) Scene recognition with CNNs: objects, scales and dataset bias. In: IEEE Conference on computer vision and pattern recognition, pp 571–579

  9. Hocke J, Martinetz T (2014) Global metric learning by gradient descent. Artif Neural Netw Mach Learn, 129–135

  10. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  11. Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716

    Article  Google Scholar 

  12. Khan SH, Hayat M, Bennamoun M, Togneri R, Sohel FA (2016) A discriminative representation of convolutional features for indoor scene recognition. IEEE Trans Image Process 25(7):3372–3383

    Article  MathSciNet  Google Scholar 

  13. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, pp 1097–1105

  14. Laurens VDM (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245

    MathSciNet  MATH  Google Scholar 

  15. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Processings of IEEE Computer vision and pattern recognition, pp 2169–2178

  16. Li LJ, Li FF (2007) What, where and who? Classifying events by scene and object recognition. In: IEEE 11th International conference on computer vision, pp 1–8

  17. Li D, Tian Y (2016) Global and local metric learning via eigenvectors. Knowl-Based Syst 116:152–162

    Article  Google Scholar 

  18. Li D, Tang J, Tian Y, Ju X (2017) Multi-view deep metric learning for image classification. In: IEEE International conference on image processing, pp 4142–4146

  19. Liong VE, Lu J, Ge Y (2015) Regularized local metric learning for person re-identification. Pattern Recogn Lett 68:288–296

    Article  Google Scholar 

  20. Liu W, Ma X, Zhou Y, Tao D, Cheng J (2018) p-laplacian regularization for scene recognition. IEEE Trans Cybern 99:1–14

    Article  Google Scholar 

  21. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  22. Lu J, Hu J, Tan YP (2017) Discriminative deep metric learning for face and kinship verification. IEEE Trans Image Process 26(9):4269–4282

    Article  MathSciNet  Google Scholar 

  23. Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349

    Article  Google Scholar 

  24. Qin J, Deng F, Yung NHC (2014) Scene categorization based on local-global feature fusion and multi-scale multi-spatial resolution encoding. Signal Image Video Process 8(1):145–154

    Article  Google Scholar 

  25. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115 (3):211–252

    Article  MathSciNet  Google Scholar 

  26. Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    Article  MathSciNet  Google Scholar 

  27. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of international conference on learning representations, pp 1–14

  28. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition, pp 1–9

  29. Wang J, Do H, Woznica A, Kalousis A (2011) Metric learning with multiple kernels. Adv Neural Inform Process Syst 24:1170–1178

    Google Scholar 

  30. Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans Image Process 26(4):2055–2068

    Article  MathSciNet  Google Scholar 

  31. Wang Z, Li Y, Tian X (2017) Semi-supervised coefficient-based distance metric learning. Neural Inform Process, 586–596

  32. Xie L, Lee F, Liu L, Yin Z, Yan Y, Wang W et al (2018) Improved spatial pyramid matching for scene recognition. Pattern Recogn 82:118–129

    Article  Google Scholar 

  33. Xu Y, Han Y, Hong R, Tian Q (2018) Sequential video VLAD: training the aggregation locally and temporally. IEEE Trans Image Process 27(10):4933–4944

    Article  MathSciNet  Google Scholar 

  34. Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pp 270–279

  35. Yuan Y, Mou L, Lu X (2017) Scene recognition by manifold regularized deep learning architecture. IEEE Trans Neural Netw Learn Syst 26(10):2222–2233

    Article  MathSciNet  Google Scholar 

  36. Zhang J, Zhao X (2017) Integrated global-local metric learning for person re-identification. In: IEEE Winter conference on applications of computer vision, pp 596–604

  37. Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2018) Pooling the convolutional layers in deep ConvNets for video action recognition. IEEE Trans Circ Syst Video Technol 28(8):1839–1849

    Article  Google Scholar 

  38. Zheng C, Yi Y, Qi M, Liu F, Bi C, Wang J, Kong J (2018) Multicriteria-based active discriminative dictionary learning for scene recognition. IEEE Access 6:4416–4426

    Article  Google Scholar 

  39. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using Places database. In: Proceedings of the 27th international conference on neural information processing systems, pp 487–495

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Peng, G. & Lin, W. Self-weighted discriminative metric learning based on deep features for scene recognition. Multimed Tools Appl 79, 2769–2788 (2020). https://doi.org/10.1007/s11042-019-08486-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08486-0

Keywords

Navigation