Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Khan, Altaf; Chefranov, Alexander; Demirel, Hasan

doi:10.1007/s10489-023-04460-4

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Published: 30 January 2023

Volume 53, pages 18431–18449, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

386 Accesses
2 Citations
Explore all metrics

Abstract

Scene recognition is a challenging problem due to intra-class variations and inter-class similarities. Traditional methods and convolutional neural networks (CNN) represent the global spatial structure, which is suitable for general scene classification and object recognition, but show poor presentation for particular indoor or outdoor medium–scale scene datasets. In this manuscript, we study the local and global structures of image scene, and then combine both types of information for indoor and outdoor scenes to improve the scene recognition accuracy. Local region structure indicates sub-part of the scene, such as sky or ground, etc., and global structure indicates whole scene structure, such as sky-background-ground outdoor scene type. For this purpose, the multi-layer convolutional features of inception and residual-based architecture are used at intermediate and higher layers to preserve both local and global structures of image scene. Each layer used for feature extraction, is connected with the global average pooling to obtain a discriminative representation of the image scenes. In this way, local structure is explored at the intermediate convolutional layers, and global spatial structure is obtained from the higher layers. The proposed method is evaluated on 8-scene, 15-scene, UMC-21, MIT67, and 12-scene challenging datasets achieving 98.51%, 96.49%, 99.05%, 80.31%, and 84.88%, respectively, significantly outperforming state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FCT: fusing CNN and transformer for scene classification

Article 15 September 2022

A Novel Method for Scene Classification Feeding Mid-Level Image Patch to Convolutional Neural Networks

Scene representation using a new two-branch neural network model

Article 01 December 2023

References

Anderson CH, Van Essen DC, Olshausen BA (2005) CHAPTER 3 - directed visual attention and the dynamic control of information flow. In: Itti L, Rees G, Tsotsos JK (eds) Neurobiology of attention. Academic Press, Burlington, pp 11–17
Chapter Google Scholar
Richards W, Jepson A, Feldman J (1996) Priors, preferences and categorical percepts. In: David CK, Whitman R (eds) Perception as Bayesian inference. Cambridge University Press, pp 93–122
Chapter Google Scholar
Ansari GJ et al (2021) A non-blind Deconvolution semi pipelined approach to understand text in blurry natural images for edge intelligence. Inf Process Manag 58(6):102675
Article Google Scholar
Masood H et al (2022) Recognition and tracking of objects in a clustered remote scene environment. Comput Mater Contin 70(1):1699–1719
Google Scholar
Nedovic V et al (2010) Stages as models of scene geometry. IEEE Trans Pattern Anal Mach Intell 32(9):1673–1687
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Khan A, Chefranov A, Demirel H (2020) Texture gradient and deep features fusion-based image scene geometry identification system using extreme learning machine. In: 2020 3rd international conference of intelligent robotic and control engineering (IRCE). University of Oxford, UK
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06)
Google Scholar
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems. ACM, San Jose, pp 270–279
Chapter Google Scholar
Lou Z, Gevers T, Hu N (2015) Extracting 3D layout from a single image using global image structures. IEEE Trans Image Process 24(10):3098–3108
Article MathSciNet MATH Google Scholar
Khan A, Chefranov A, Demirel H (2020) Image-level structure recognition using image features, templates, and ensemble of classifiers. Symmetry 12(7):1072
Article Google Scholar
Sanchez J et al (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Article MathSciNet MATH Google Scholar
Cheng X et al (2018) Scene recognition with objectness. Pattern Recogn 74:474–487
Article Google Scholar
Zou J et al (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226
Article MathSciNet Google Scholar
Tang P, Wang H, Kwong S (2017) G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing 225:188–197
Article Google Scholar
Liu S, Tian G, Xu Y (2019) A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter. Neurocomputing 338:191–206
Article Google Scholar
Khan A, Chefranov A, Demirel H (2021) Image scene geometry recognition using low-level features fusion at multi-layer deep CNN. Neurocomputing 440:111–126
Article Google Scholar
Zafar B et al (2018) Image classification by addition of spatial information based on histograms of orthogonal vectors. PLoS One 13(6):e0198175
Article Google Scholar
Ali N et al (2018) A hybrid geometric spatial image representation for scene classification. PLoS One 13(9):e0203339
Article Google Scholar
Giveki D (2021) Scale-space multi-view bag of words for scene categorization. Multimed Tools Appl 80(1):1223–1245
Article Google Scholar
Meng X, Wang Z, Wu L (2012) Building global image features for scene recognition. Pattern Recogn 45(1):373–380
Article Google Scholar
Yuan L et al (2015) Improve scene classification by using feature and kernel combination. Neurocomputing 170:213–220
Article Google Scholar
Ghalyan IFJ (2020) Estimation of ergodicity limits of bag-of-words modeling for guaranteed stochastic convergence. Pattern Recogn 99:107094
Article Google Scholar
Zhou L, Zhou Z, Hu D (2013) Scene classification using a multi-resolution bag-of-features model. Pattern Recogn 46(1):424–433
Article Google Scholar
Lin G et al (2017) Visual feature coding based on heterogeneous structure fusion for image classification. Inf Fusion 36(C):275–283
Article Google Scholar
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision
Google Scholar
Hussain, N., et al. Intelligent deep learning and improved whale optimization algorithm based framework for object recognition. 2021
Google Scholar
Özyurt F, Sert E, Avcı D (2020) An expert system for brain tumor detection: fuzzy C-means with super resolution and convolutional neural network with extreme learning machine. Med Hypotheses 134:109433
Article Google Scholar
Khan MA et al (2021) A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl 80(28):35827–35849
Article Google Scholar
Kwon Y-H, Shin S-B, Kim S-D (2018) Electroencephalography based fusion two-dimensional (2D)-convolution neural networks (CNN) model for emotion recognition system. Sensors (Basel, Switzerland) 18(5):1383
Article Google Scholar
Khan S et al (2021) Human action recognition: a paradigm of best deep learning features selection and serial based extended fusion. Sensors (Basel) 21(23)
Deng J et al (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition
Google Scholar
Szegedy C et al (2015) Going deeper with convolutions, pp 1–9
Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR)
Google Scholar
He K et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Zhou B et al (2018) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Article Google Scholar
Azhar I, Sharif M, Raza M, Khan MA, Yong H-S (2021) Decision support system for face sketch synthesis using deep learning and artificial intelligence. Sensors 21:8178. https://doi.org/10.3390/s21248178
Article Google Scholar
Saleem F et al (2021) Human gait recognition: a single stream optimal deep learning features fusion. Sensors (Basel) 21(22):7584
Article Google Scholar
Wang C, Peng G, De Baets B (2020) Deep feature fusion through adaptive discriminative metric learning for scene recognition. Inf Fusion 63:1–12
Article Google Scholar
Liu B et al (2015) Learning a representative and discriminative part model with deep convolutional features for scene recognition. In: Computer vision -- ACCV 2014. Springer International Publishing, Cham
Google Scholar
Wang C, Peng G, Lin W (2021) Robust local metric learning via least square regression regularization for scene recognition. Neurocomputing 423:179–189
Article Google Scholar
Yu W et al (2017) Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing 237:235–241
Article Google Scholar
Herranz L, Jiang S, Li X (2016) Scene recognition with CNNs: objects, scales and dataset Bias. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Szegedy C et al (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence. AAAI Press, San Francisco, pp 4278–4284
Google Scholar
Alex K, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Information Process Syst:1097–1105
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
He M et al (2010) Performance evaluation of score level fusion in multimodal biometric systems. Pattern Recogn 43(5):1789–1800
Article MathSciNet MATH Google Scholar
Kittler J et al (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Article Google Scholar
Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190
Article Google Scholar
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: 2009 IEEE conference on computer vision and pattern recognition
Google Scholar
Hoiem D, Efros AA, Hebert M (2007) Recovering surface layout from an image. Int J Comput Vis 75(1):151–172
Article MATH Google Scholar
Khan SH et al (2016) A discriminative representation of convolutional features for indoor scene recognition. IEEE Trans Image Process 25(7):3372–3383
Article MathSciNet MATH Google Scholar
Hayat M et al (2016) A spatial layout and scale invariant feature representation for indoor scene classification. IEEE Trans Image Process 25(10):4829–4841
Article MathSciNet MATH Google Scholar
Geusebroek J-M, Smeulders AWM (2005) A six-stimulus theory for stochastic texture. Int J Comput Vis 62(1):7–16
Article Google Scholar
Geusebroek J-M, Smeulders AWM, van de Weijer J (2002) Fast anisotropic gauss filtering. In: Computer vision — ECCV 2002. Springer Berlin Heidelberg, Berlin
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05)
Google Scholar
Xiao J et al (2010) SUN database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, San Francisco
Google Scholar
Zafar B et al (2018) Intelligent image classification-based on spatial weighted histograms of concentric circles. Comput Sci Inf Syst 15:615–633
Article Google Scholar
LeCun Y et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Lecun Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Simonyan K,Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 http://arxiv.org/abs/1409.1556
Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Patalas M, Halikowski (2019) A model for generating workplace procedures using a CNN-SVM architecture. Symmetry 11(9):1151
Article Google Scholar
LeCun Y, Cortes C, Burges CJ (2010) [online] MNIST hand-written digit database. AT&T Labs
Google Scholar
Guang-Bin H, Qin-Yu Z, Chee-Kheong S (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks (IEEE cat. No.04CH37541)
Google Scholar
Yu Y, Liu F (2018) A two-stream deep fusion framework for high-resolution aerial scene classification. Comput Intell Neurosci 2018:8639367
Article Google Scholar
Khan A et al (2021) White blood cell type identification using multi-layer convolutional features with an extreme-learning machine. Biomed Signal Process Control 69:102932
Article Google Scholar
Liang G et al (2018) Combining convolutional neural network with recursive neural network for blood cell image classification. IEEE Access 6:36188–36197
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on international conference on machine learning - volume 37. JMLR.org, Lille, pp 448–456
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J Comput Appl Math 196(2):425–436
Article MathSciNet MATH Google Scholar
Mohareb F et al (2016) Ensemble-based support vector machine classifiers as an efficient tool for quality assessment of beef fillets from electronic nose data. Anal Methods 8(18):3711–3721
Article Google Scholar
Tulyakov S et al (2008) Review of classifier combination methods. In: Marinai S, Fujisawa H (eds) Machine learning in document analysis and recognition. Springer Berlin Heidelberg, Berlin, pp 361–386
Chapter Google Scholar
Liu C-L (2005) Classifier combination based on confidence transformation. Pattern Recogn 38(1):11–28
Article MATH Google Scholar
Tax DMJ et al (2000) Combining multiple classifiers by averaging or by multiplying? Pattern Recogn 33(9):1475–1485
Article Google Scholar
Rosset S (2004) Model selection via the AUC. In: Proceedings of the twenty-first international conference on machine learning. ACM, Banff, p 89
Google Scholar
Sun H et al (2017) Scene classification with the discriminative representation. In: 2017 2nd international conference on multimedia and image processing (ICMIP)
Google Scholar
Liu B, Liu J, Lu H (2015) Learning representative and discriminative image representation by deep appearance and spatial coding. Comput Vis Image Underst 136:23–31
Article Google Scholar
Hu F et al (2015) Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens 7(11):14680–14707
Article Google Scholar
Ma C, Mu X, Sha D (2019) Multi-layers feature fusion of convolutional neural network for scene classification of remote sensing. IEEE Access 7:121685–121694
Article Google Scholar
Wu H et al (2020) Self-attention network with joint loss for remote sensing image scene classification. IEEE Access 8:210347–210359
Article Google Scholar
Wang X et al (2020) Remote sensing scene classification using heterogeneous feature extraction and multi-level fusion. IEEE Access 8:217628–217641
Article Google Scholar
Wu J, Lin Z, Zha H (2019) Essential tensor learning for multi-view spectral clustering. IEEE Trans Image Process 28(12):5910–5922
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of Computing and Information Technology, University of Narowal, Narowal, Punjab, Pakistan
Altaf Khan
Computer Engineering Department, Faculty of Engineering, Eastern Mediterranean University, TRNC, Mersin 10, Turkey
Alexander Chefranov
Electrical & Electronics Engineering Department, Faculty of Engineering, Eastern Mediterranean University, TRNC, Mersin 10, Turkey
Hasan Demirel

Authors

Altaf Khan
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Chefranov
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Demirel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Altaf Khan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khan, A., Chefranov, A. & Demirel, H. Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2. Appl Intell 53, 18431–18449 (2023). https://doi.org/10.1007/s10489-023-04460-4

Download citation

Accepted: 08 January 2023
Published: 30 January 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-023-04460-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Abstract

Access this article

Similar content being viewed by others

FCT: fusing CNN and transformer for scene classification

A Novel Method for Scene Classification Feeding Mid-Level Image Patch to Convolutional Neural Networks

Scene representation using a new two-branch neural network model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Abstract

Access this article

Similar content being viewed by others

FCT: fusing CNN and transformer for scene classification

A Novel Method for Scene Classification Feeding Mid-Level Image Patch to Convolutional Neural Networks

Scene representation using a new two-branch neural network model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation