Abstract
Deep features have achieved impressive performance for image recognition. However, directly using the pre-trained deep models on scene images is not proper, because of intra-class diversity and inter-class similarity of the scene images. In this paper, based on deep features, a self-weighted discriminative metric learning (SDML) method is proposed to deal with the scene recognition. Specifically, biobjective discriminative metric function is first defined to fully exploit discriminative information from deep features. Second, a self-weighted coefficient is integrated into the biobjective function with the aim of adaptively balancing the discriminative distance between the same class and different classes. Moreover, a nuclear norm regularization is introduced to promote low rank metric. An alternating iterative optimization strategy is devised to solve the proposed method effectively. Finally, the SDML method can be extended to its kernelized version. Experimental results on three scene datasets demonstrate the superiority of our methods over several existing metric learning methods.
Similar content being viewed by others
References
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Cheng X, Lu J, Feng J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recogn 74:474–487
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Workshop on Stat Learn Comput Vision Eccv 44(247):1–22
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer society conference on computer vision and pattern recognition, pp 886–893
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighbourhood components analysis. In: Proceedings of the 17th international conference on neural information processing systems, pp 513– 520
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
Herranz L, Jiang S, Li X (2016) Scene recognition with CNNs: objects, scales and dataset bias. In: IEEE Conference on computer vision and pattern recognition, pp 571–579
Hocke J, Martinetz T (2014) Global metric learning by gradient descent. Artif Neural Netw Mach Learn, 129–135
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Khan SH, Hayat M, Bennamoun M, Togneri R, Sohel FA (2016) A discriminative representation of convolutional features for indoor scene recognition. IEEE Trans Image Process 25(7):3372–3383
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, pp 1097–1105
Laurens VDM (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Processings of IEEE Computer vision and pattern recognition, pp 2169–2178
Li LJ, Li FF (2007) What, where and who? Classifying events by scene and object recognition. In: IEEE 11th International conference on computer vision, pp 1–8
Li D, Tian Y (2016) Global and local metric learning via eigenvectors. Knowl-Based Syst 116:152–162
Li D, Tang J, Tian Y, Ju X (2017) Multi-view deep metric learning for image classification. In: IEEE International conference on image processing, pp 4142–4146
Liong VE, Lu J, Ge Y (2015) Regularized local metric learning for person re-identification. Pattern Recogn Lett 68:288–296
Liu W, Ma X, Zhou Y, Tao D, Cheng J (2018) p-laplacian regularization for scene recognition. IEEE Trans Cybern 99:1–14
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lu J, Hu J, Tan YP (2017) Discriminative deep metric learning for face and kinship verification. IEEE Trans Image Process 26(9):4269–4282
Mu Y, Ding W, Tao D (2013) Local discriminative distance metrics ensemble learning. Pattern Recogn 46(8):2337–2349
Qin J, Deng F, Yung NHC (2014) Scene categorization based on local-global feature fusion and multi-scale multi-spatial resolution encoding. Signal Image Video Process 8(1):145–154
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115 (3):211–252
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of international conference on learning representations, pp 1–14
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition, pp 1–9
Wang J, Do H, Woznica A, Kalousis A (2011) Metric learning with multiple kernels. Adv Neural Inform Process Syst 24:1170–1178
Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans Image Process 26(4):2055–2068
Wang Z, Li Y, Tian X (2017) Semi-supervised coefficient-based distance metric learning. Neural Inform Process, 586–596
Xie L, Lee F, Liu L, Yin Z, Yan Y, Wang W et al (2018) Improved spatial pyramid matching for scene recognition. Pattern Recogn 82:118–129
Xu Y, Han Y, Hong R, Tian Q (2018) Sequential video VLAD: training the aggregation locally and temporally. IEEE Trans Image Process 27(10):4933–4944
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pp 270–279
Yuan Y, Mou L, Lu X (2017) Scene recognition by manifold regularized deep learning architecture. IEEE Trans Neural Netw Learn Syst 26(10):2222–2233
Zhang J, Zhao X (2017) Integrated global-local metric learning for person re-identification. In: IEEE Winter conference on applications of computer vision, pp 596–604
Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2018) Pooling the convolutional layers in deep ConvNets for video action recognition. IEEE Trans Circ Syst Video Technol 28(8):1839–1849
Zheng C, Yi Y, Qi M, Liu F, Bi C, Wang J, Kong J (2018) Multicriteria-based active discriminative dictionary learning for scene recognition. IEEE Access 6:4416–4426
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using Places database. In: Proceedings of the 27th international conference on neural information processing systems, pp 487–495
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, C., Peng, G. & Lin, W. Self-weighted discriminative metric learning based on deep features for scene recognition. Multimed Tools Appl 79, 2769–2788 (2020). https://doi.org/10.1007/s11042-019-08486-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08486-0