Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network

Rafique, Adnan Ahmed; Gochoo, Munkhjargal; Jalal, Ahmad; Kim, Kibum

doi:10.1007/s11042-022-13717-y

Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network

Published: 20 September 2022

Volume 82, pages 13401–13430, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Adnan Ahmed Rafique^1,2,
Munkhjargal Gochoo³,
Ahmad Jalal¹ &
…
Kibum Kim⁴

841 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Recent advances in visionary technologies impacted multi-object recognition and scene understanding. Such scene-understanding tasks are a demanding part of several technologies such as augmented reality based scene integration, robotic navigation, autonomous driving and tourist guide applications. Incorporating visual information in contextually unified segments, super-pixel-based approaches significantly mitigate the clutter, which is normal in pixel wise frameworks during scene understanding. Super-pixels allow customized shapes and variable size patches of connected components to be obtained. Furthermore, the computational time for these segmentation approaches can significantly decreased due to the reduced number of super-pixel target clusters. Hence, the super pixel-based approaches are more commonly used in robotics, computer vision and other intelligent systems. In this paper, we propose a Maximum Entropy scaled Super-Pixels (MEsSP) Segmentation method that encapsulates super-pixel segmentation based on an Entropy Model and utilizes local energy terms to label the pixels. Initially, after acquisition and pre-processing, image is segmented by two different methods: Fuzzy C-Means (FCM) and MEsSP. Then, to extract the features from these segmented objects, the dynamic geometrical features, fast Fourier transform (FFT), blob extraction, Maximally Stable Extremal Regions (MSER) and KAZE features are extracted using the bag of features approach. Then, to categorize the objects, multiple kernel learning is applied. Finally, a deep belief network (DBN) assigns the relevant labels to the scenes based on the categorized objects, intersection over union scores and dice similarity coefficient. The experimental results regarding multiple objects recognition accuracy, precision, recall and F1 scores over PASCAL VOC, Caltech 101 and UIUC Sports datasets show a remarkable performance. In addition, the evaluation of proposed scene recognition method over these benchmark datasets outperforms the state of the art (SOTA) methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Multi-objects Detection and Segmentation for Scene Understanding Based on Texton Forest and Kernel Sliding Perceptron

Article 01 February 2021

Low-Resolution Infrared Small Target Detection Algorithm Based on Superpixel Segmentation Technology and AdaBoost-SVM Algorithm

A Novel Approach for Objectness Estimation Based on Saliency Segmentation and Superpixels Clustering

References

Ahmed A, Jalal A, Kim K (2020) A novel statistical method for scene classification based on multi-object categorization and logistic regression. Sensors 20(14):3871
Article Google Scholar
Alcantarilla PF, Bartoli A, Davison AJ (2012) KAZE features. In: European conference on computer vision (pp 214–227). Springer, Berlin, Heidelberg
Appiah O, Asante M, Hayfron-Acquah JB (2020) Improved approximated median filter algorithm for real-time computer vision applications. Journal of King Saud University-Computer and Information Sciences
Arasu B, Kumaran S (2014) Blind man’s artificial EYE an innovative idea to help the blind. In: Conference proceeding of the international journal of engineering development and research (IJEDR), SRM university, Kattankulathur, pp 205–207
Armeni I, Sax S, Zamir AR, Savarese S (2017) Joint 2d-3d-semantic data for indoor scene understanding. arXiv:1702.01105
Arnold E, Al-Jarrah OY, Dianati M, Fallah S, Oxtoby D, Mouzakitis A (2019) A survey on 3d object detection methods for autonomous driving applications. IEEE Trans Intell Transp Syst 20(10):3782–3795
Article Google Scholar
Asif U, Bennamoun M, Sohel FA (2017) RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans Robot 33 (3):547–564
Article Google Scholar
Bakalos N, Voulodimos A, Doulamis N, Doulamis A, Ostfeld A, Salomons E, Li P (2019) Protecting water infrastructure from cyber and physical threats: Using multimodal data fusion and adaptive deep learning to monitor critical systems. IEEE Signal Proc Mag 36(2):36–48
Article Google Scholar
Bansal M, Kumar M, Kumar M, Kumar K (2021) An efficient technique for object recognition using the Shi-Tomasi corner detection algorithm. Soft Comput 25(6):4423–4432
Article Google Scholar
Borges PVK, Conci N, Cavallaro A (2013) Video-based human behavior understanding: a survey. IEEE Trans Circuits Syst Vid Technol 23 (11):1993–2008
Article Google Scholar
Chen PY, Liu AH, Liu YC, Wang YCF (2019) Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 2624–2632
Chen L, Zhan W, Tian W, He Y, Zou Q (2019) Deep integration: a multi-label architecture for road scene recognition. IEEE Trans Image Process 28(10):4883–4898
Article MathSciNet MATH Google Scholar
Chung PC, Liu CD (2008) A daily behavior enabled hidden Markov model for human behavior understanding. Pattern Recogn 41(5):1572–1580
Article MATH Google Scholar
Doulamis ND, Voulodimos AS, Kosmopoulos DI, Varvarigou TA (2010, October) Enhanced human behavior recognition using hmm and evaluative rectification. In: Proceedings of the first ACM international workshop on Analysis and retrieval of tracked events and motion in imagery streams, pp 39–44
Debelee TG, Schwenker F, Rahimeto S, Yohannes D (2019) Evaluation of modified adaptive k-means segmentation algorithm. Comput Vis Media 5 (4):347–361
Article Google Scholar
Dong X, Lei Y, Wang T, Thomas M, Tang L, Curran WJ, Yang X (2019) Automatic multiorgan segmentation in thorax CT images using U-net-GAN. Med Phys 46(5):2157–2168
Article Google Scholar
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Inter J Comput Vis 111(1):98–136
Article Google Scholar
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: 2004 conference on computer vision and pattern recognition workshop (pp 178–178). IEEE
Feng J, Fu A (2018) Scene semantic recognition based on probability topic model. Information 9(4):97
Article Google Scholar
Feng X, Jiang Y, Yang X, Du M, Li X (2019) Computer vision algorithms and hardware implementations: A survey. Integration 69:309–320
Article Google Scholar
Gadekallu TR, Rajput DS, Reddy M, Lakshmanna K, Bhattacharya S, Singh S, Alazab M (2021) A novel PCA–whale optimization-based deep neural network model for classification of tomato plant diseases using GPU. J Real-Time Image Proc 18(4):1383–1396
Article Google Scholar
Guo J, Gould S (2015) Deep CNN ensemble with data augmentation for object detection. arXiv:1506.07224
Gupta S, Kumar M, Garg A (2019) Improved object recognition results using SIFT and ORB feature detector. Multimed Tools Appl 78(23):34157–34171
Article Google Scholar
Hakak S, Alazab M, Khan S, Gadekallu TR, Maddikunta PKR, Khan WZ (2021) An ensemble machine learning approach through effective feature extraction to classify fake news. Futur Gener Comput Syst 117:47–58
Article Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Hussain N, Khan MA, Sharif M, Khan SA, Albesher AA, Saba T, Armaghan A (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-08852-3
Jalal A, Batool M, Kim K (2020) Stochastic recognition of physical activity and healthcare using tri-axial inertial wearable sensors. Appl Sci 10(20):7122
Article Google Scholar
Jalal A, Kim YH, Kim YJ, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit 61:295–308
Article Google Scholar
Jalal A, Sarif N, Kim JT, Kim TS (2013) Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor Built Environ 22(1):271–279
Article Google Scholar
Jaritz M, Gu J, Su H (2019) Multi-view pointnet for 3d scene understanding
Jiang X, Guo Y, Chen H, Zhang Y, Lu Y (2019) An adaptive region growing based on neutrosophic set in ultrasound domain for image segmentation. IEEE Access 7:60584–60593
Article Google Scholar
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–799
Kachouri R, Soua M, Akil M (2016) Unsupervised image segmentation based on local pixel clustering and Low-Level region merging. In: 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) (pp 177–182). IEEE
Kamada S, Ichimura T (2019) An object detection by using adaptive structural learning of deep belief network. In: 2019 international joint conference on neural networks (IJCNN) (pp 1–8). IEEE
Kosmopoulos DI, Doulamis ND, Voulodimos AS (2012) Bayesian filter based behavior recognition in workflows allowing for user feedback. Comput Vis Image Underst 116(3):422–434
Article Google Scholar
Li LJ, Fei-Fei L (2007) What, where and who? classifying events by scene and object recognition. In: 2007 IEEE 11th international conference on computer vision (pp 1–8). IEEE
Liu MY, Tuzel O, Ramalingam S, Chellappa R (2011) Entropy rate superpixel segmentation. In: CVPR 2011 (pp 2097–2104). IEEE
Liu Y, Zhou S, Chen Q (2011) Discriminative deep belief networks for visual data classification. Pattern Recogn 44(10-11):2287–2296
Article MATH Google Scholar
Mahmood M, Jalal A, Kim K (2019) WHITE STAG Model: Wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors. Multimedia Tools and Applications, pp 1–32
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
Article Google Scholar
Miao J, Zhou X, Huang TZ (2020) Local segmentation of images using an improved fuzzy C-means clustering algorithm based on self-adaptive dictionary learning. Applied Soft Computing, p 106200
Nair V, Chatterjee M, Tavakoli N, Namin AS, Snoeyink C (2020) Fast fourier transformation for optimizing convolutional neural networks in object recognition. arXiv:2010.04257
Nanni L, Lumini A (2013) Heterogeneous bag-of-features for object/scene recognition. Appl Soft Comput 13(4):2171–2178
Article Google Scholar
Narain S, Ranganathan A, Noubir G (2019) Security of GPS/INS based on-road location tracking systems. In: 2019 IEEE Symposium on Security and Privacy (SP) (pp 587–601). IEEE
Niu Z, Hua G, Gao X, Tian Q (2012) Context aware topic model for scene recognition. In: 2012 IEEE Conference on computer vision and pattern recognition (pp 2743–2750). IEEE
Quaid MAK, Jalal A (2020) Wearable sensors based human behavioral pattern recognition using statistical features and reweighted genetic algorithm. Multimed Tools Appl 79(9):6061–6083
Article Google Scholar
Rafique AA, Jalal A, Ahmed A (2019) Scene understanding and recognition: Statistical segmented model using geometrical features and gaussian naïve Bayes. In: IEEE conference on International Conference on Applied and Engineering Mathematics (vol 57)
Rashid M, Khan MA, Alhaisoni M, Wang SH, Naqvi SR, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037
Article Google Scholar
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 658–666
Rohan A, Rabah M, Kim SH (2019) Convolutional neural network-based real-time object detection and tracking for parrot AR drone 2. IEEE Access 7:69575–69584
Article Google Scholar
Shetty S (2016) Application of convolutional neural network for image classification on Pascal VOC challenge 2012 dataset. arXiv:1607.03785
Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition on the semantic manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1312–1320
Szuster P (2019) Blob extraction algorithm in detection of convective cells for data fusion. J Telecommun Inf Technol (4), pp 65–73
Uçar A, Demir Y, Güzeliş C (2016) Moving towards in object recognition with deep learning for autonomous driving applications. In: 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) (pp 1–5). IEEE
Ulhaq A, Born J, Khan A, Gomes DPS, Chakraborty S, Paul M (2020) Covid-19 control by computer vision approaches: a survey. IEEE Access 8:37–179456
Article Google Scholar
Vasan D, Alazab M, Wassan S, Naeem H, Safaei B, Zheng Q (2020) IMCFN: Image-Based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138
Article Google Scholar
Veta M, Van Diest PJ, Jiwa M, Al-Janabi S, Pluim JP (2016) Mitosis counting in breast cancer: Object-level interobserver agreement and comparison to an automatic method. PloS one 11(8):e0161286
Article Google Scholar
Xia S, Zeng J, Leng L, Fu X (2019) WS-AM: Weakly Supervised attention map for scene recognition. Electronics 8(10):1072
Article Google Scholar
Xu Y, Wu T, Gao F, Charlton JR, Bennett KM (2020) Improved small blob detection in 3D images using jointly constrained deep learning and Hessian analysis. Sci Rep 10(1):1–12
Google Scholar
Zamani F, Jamzad M (2017) A feature fusion based localized multiple kernel learning system for real world image classification. EURASIP J Image Vid Process 2017(1):78
Article Google Scholar
Zhang Y, Jin R, Zhou ZH (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1-4):43–52
Article Google Scholar
Zhang L, Zhen X, Shao L (2014) Learning object-to-class kernels for scene classification. IEEE Trans Image Process 23(8):3241–3253
Article MathSciNet MATH Google Scholar
Zhao W, Fu Y, Wei X, Wang H (2018) An improved image semantic segmentation method based on superpixels and conditional random fields. Appl Sci 8(5):837
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zheng C, Yi Y, Qi M, Liu F, Bi C, Wang J, Kong J (2017) Multicriteria-based active discriminative dictionary learning for scene recognition. IEEE Access 6:4416–4426
Article Google Scholar
Zhu H, Zhuang Z, Zhou J, Wang X, Xu W (2018) Improved graph-cut segmentation for ultrasound liver cyst image. Multimed Tools Appl 77 (21):28905–28923
Article Google Scholar
Zia S, Yuksel B, Yuret D, Yemez Y (2017) RGB-D object recognition using deep convolutional neural networks. In: Proceedings of the IEEE International conference on computer vision workshops, pp 896–903

Download references

Acknowledgements

This research was supported by the Ministry of Culture, Sports and Tourism and Korea Creative Content Agency (Project Number: R2021040093).

Author information

Authors and Affiliations

Air University, E-9, Islamabad, Pakistan
Adnan Ahmed Rafique & Ahmad Jalal
University of Poonch, Rawalakot, Rawalakot, Poonch, AJK, Pakistan
Adnan Ahmed Rafique
Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, 15551, UAE
Munkhjargal Gochoo
Department of Human-Computer Interaction, Hanyang University, Seoul, South Korea
Kibum Kim

Authors

Adnan Ahmed Rafique
View author publications
You can also search for this author in PubMed Google Scholar
Munkhjargal Gochoo
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Jalal
View author publications
You can also search for this author in PubMed Google Scholar
Kibum Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kibum Kim.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rafique, A.A., Gochoo, M., Jalal, A. et al. Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network. Multimed Tools Appl 82, 13401–13430 (2023). https://doi.org/10.1007/s11042-022-13717-y

Download citation

Received: 25 January 2021
Revised: 03 March 2022
Accepted: 24 August 2022
Published: 20 September 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11042-022-13717-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network

Abstract

Access this article

Similar content being viewed by others

Multi-objects Detection and Segmentation for Scene Understanding Based on Texton Forest and Kernel Sliding Perceptron

Low-Resolution Infrared Small Target Detection Algorithm Based on Superpixel Segmentation Technology and AdaBoost-SVM Algorithm

A Novel Approach for Objectness Estimation Based on Saliency Segmentation and Superpixels Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network

Abstract

Access this article

Similar content being viewed by others

Multi-objects Detection and Segmentation for Scene Understanding Based on Texton Forest and Kernel Sliding Perceptron

Low-Resolution Infrared Small Target Detection Algorithm Based on Superpixel Segmentation Technology and AdaBoost-SVM Algorithm

A Novel Approach for Objectness Estimation Based on Saliency Segmentation and Superpixels Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation