Stage classification using two-stream deep convolutional neural networks

Chefranov, Alexander; Khan, Altaf; Demirel, Hasan

doi:10.1007/s11760-021-01911-8

Stage classification using two-stream deep convolutional neural networks

Original Paper
Published: 30 June 2021

Volume 16, pages 311–319, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

337 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Stage classification is a significant important task for scene understanding, 3D TV, autonomous vehicle, and object localization. Images can be categorized into a limited number of 3D scene geometries, called stages, and each one of them is having a unique depth pattern to provide a specific context for stage objects. Moreover, convolutional neural networks (CNN) have shown high performance of scene classification due to their powerful perspective of feature learning and reasoning. However, we found that edge-preserving Laplacian filter (LF) based on Laplacian pyramids, which enhances the edge details of image scene owing to this, it can improve the performance of stage classification. We introduce a novel method of stage classification based on two-stream CNN model in which one stream is encoded by LF, and another stream is normal RGB images and their output is fused at the decision level. This proposed method is evaluated on two different stage datasets: first ‘stage-1209’ contains 1209 images, and second, ‘12-scene’ image dataset contains 12,000 images. Results exhibited that LF encoded images have a positive influence on stage classification accuracy. Following this, while using product rule the proposed method obtains the most significant improvement in the stage classification for both datasets. It improves particularly 7.96% stage accuracy on 12-scene image dataset, compared to the state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An algorithm for highway vehicle detection based on convolutional neural network

Article Open access 24 October 2018

Evaluating Faster-RCNN and YOLOv3 for Target Detection in Multi-sensor Data

Image Classification with A-MnasNet and R-MnasNet on NXP Bluebox 2.0

Notes

LF source code: https://people.csail.mit.edu/sparis/publi/2011/siggraph/.

References

Nedovic, V., Smeulders, A.W., Redert, A., Geusebroek, J.M.: Stages as models of scene geometry. IEEE Trans Pattern Anal Mach Intell 32(9), 1673–1687 (2010). https://doi.org/10.1109/TPAMI.2009.174
Article Google Scholar
Yang, Y., Newsam, S.: Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery. In: 2008 15th IEEE international conference on image processing, pp. 1852–1855 (2008)
Santos, J.A.D., Penatti, O.A.B., Torres, R.D.S.: Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. VISAPP (2010). https://doi.org/10.5220/0002843402030208
Article Google Scholar
Chen, C., Zhang, B., Su, H., Li, W., Wang, L.: Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4), 745–752 (2016). https://doi.org/10.1007/s11760-015-0804-2
Article Google Scholar
Li, H., Gu, H., Han, Y., Yang, J.: Object-oriented classification of high-resolution remote sensing imagery based on an improved colour structure code and a support vector machine. Int. J. Remote Sens. 31(6), 1453–1470 (2010). https://doi.org/10.1080/01431160903475266
Article Google Scholar
Luo, B., Jiang, S., Zhang, L.: Indexing of remote sensing images with different resolutions by multiple features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6(4), 1899–1912 (2013). https://doi.org/10.1109/JSTARS.2012.2228254
Article Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: 2009 IEEE 12th international conference on computer vision, pp. 1849–1856 (2009)
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: 10th IEEE International Conference on Computer Vision (ICCV'05), vol. 1, pp. 654–661 (2005)
Oliva, A., Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Amsterdam (2001). https://doi.org/10.1023/A:1011139631724
Book MATH Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol. 1152, pp. 1150–1157 (1999)
Ojala, T., Pietik, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002). https://doi.org/10.1109/tpami.2002.1017623
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 881, pp. 886–893 (2005)
Lou, Z., Gevers, T., Hu, N.: Extracting 3D layout from a single image using global image structures. IEEE Trans. Image Process. 24(10), 3098–3108 (2015). https://doi.org/10.1109/TIP.2015.2431443
Article MathSciNet MATH Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Alex, K., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—Volume 1. Curran Associates Inc., Lake Tahoe, Nevada, pp 1097–1105 (2012)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. Paper presented at the proceedings of the 31st AAAI conference on artificial intelligence, San Francisco, California, USA, vol. 31, pp. 4278–4284 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, http://arxiv.org/abs/1409.1556
Liu, B., Liu, J., Wang, J., Lu, H.: Learning a representative and discriminative part model with deep convolutional features for scene recognition. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) Computer Vision—ACCV 2014, pp. 643–658. Springer International Publishing, Cham (2015)
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Cheron, G., Laptev, I., Schmid, C.: P-CNN: Pose-based CNN features for action recognition. Paper presented at the proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 3218–3226, https://doi.org/10.1109/ICCV.2015.368 (2015)
Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, Texas, pp. 457–468 (2016)
Anwer, R.M., Khan, F.S., Weijer, J.V.D., Molinier, M., Laaksonen, J.: Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogr. Remote Sens. 138, 74–85 (2018). https://doi.org/10.1016/j.isprsjprs.2018.01.023
Article Google Scholar
Yu, Y., Liu, F.: A two-stream deep fusion framework for high-resolution aerial scene classification. Comput. Intell. Neurosci. 2018, 8639367 (2018). https://doi.org/10.1155/2018/8639367
Article Google Scholar
Paris, S., Hasinoff, S.W., Kautz, J.: Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Trans. Graph. 30(4), 1–12 (2011). https://doi.org/10.1145/2010324.1964963
Article Google Scholar
Mohandes, M., Deriche, M., Aliyu, S.O.: Classifiers combination techniques: a comprehensive review. IEEE Access 6, 19626–19639 (2018). https://doi.org/10.1109/ACCESS.2018.2813079
Article Google Scholar
Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, pp. 361–386. Springer, Berlin (2008)
Chapter Google Scholar
Geusebroek, J.-M., Smeulders, A.W.M.: A Six-Stimulus Theory for Stochastic Texture. Kluwer Academic Publishers, Amsterdam (2005). https://doi.org/10.1007/s11263-005-4632-7
Book Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). IEEE, New York, NY, USA, pp. 2169–2178 (2006)
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013). https://doi.org/10.1007/s11263-013-0636-x
Article MathSciNet MATH Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to Zoo. In: 2010 IEEE computer society conference on computer vision and pattern recognition, San Francisco, CA, USA . IEEE, pp. 3485–3492 (2010)
Cun, Y.L., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., Jackel, L.D., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: David, S.T. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan Kaufmann Publishers Inc., Burlington (1990)
Google Scholar
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017). https://doi.org/10.1162/neco_a_00990
Article MathSciNet MATH Google Scholar
Tang, P., Wang, H., Kwong, S.: G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing 225, 188–197 (2017). https://doi.org/10.1016/j.neucom.2016.11.023
Article Google Scholar
Liu, S., Tian, G., Xu, Y.: A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter. Neurocomputing 338, 191–206 (2019)
Article Google Scholar
Khan, A., Chefranov, A., Demirel, H.: Texture gradient and deep features fusion-based image scene geometry identification system using extreme learning machine. In: 2020 3rd International Conference of Intelligent Robotic and Control Engineering (IRCE), University of Oxford, UK, pp. 37–41 (2020)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998). https://doi.org/10.1109/34.667881
Article Google Scholar
Snelick, R., Uludag, U., Mink, A., Indovina, M., Jain, A.: Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 450–455 (2005). https://doi.org/10.1109/TPAMI.2005.57
Article Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: 2006 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 2137–2144 (2006)
Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: 10th IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Beijing, China, pp. 1800–1807, vol. 1802 (2005)
Gettyimages: gettyimages data https://www.gettyimages.com/photos/
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018). https://doi.org/10.1109/TPAMI.2017.2723009
Article Google Scholar
Wang, C., Peng, G., De Baets, B.: Deep feature fusion through adaptive discriminative metric learning for scene recognition. Inf. Fusion 63, 1–12 (2020). https://doi.org/10.1016/j.inffus.2020.05.005
Article Google Scholar
Kim, S., Kavuri, S., Lee, M.: Deep network with support vector machines. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) Neural Information Processing, pp. 458–465. Springer, Berlin (2013)
Chapter Google Scholar
Patalas, M.: Halikowski: a model for generating workplace procedures using a CNN-SVM architecture. Symmetry 11(9), 1151 (2019). https://doi.org/10.3390/sym11091151
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Eastern Mediterranean University, TRNC, Mersin 10, Turkey
Alexander Chefranov & Altaf Khan
Department of Electrical and Electronics Engineering, Eastern Mediterranean University, TRNC, Mersin 10, Turkey
Hasan Demirel

Authors

Alexander Chefranov
View author publications
You can also search for this author in PubMed Google Scholar
Altaf Khan
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Demirel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Altaf Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 1312 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chefranov, A., Khan, A. & Demirel, H. Stage classification using two-stream deep convolutional neural networks. SIViP 16, 311–319 (2022). https://doi.org/10.1007/s11760-021-01911-8

Download citation

Received: 18 August 2020
Revised: 12 January 2021
Accepted: 10 April 2021
Published: 30 June 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11760-021-01911-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stage classification using two-stream deep convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

An algorithm for highway vehicle detection based on convolutional neural network

Evaluating Faster-RCNN and YOLOv3 for Target Detection in Multi-sensor Data

Image Classification with A-MnasNet and R-MnasNet on NXP Bluebox 2.0

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOC 1312 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stage classification using two-stream deep convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

An algorithm for highway vehicle detection based on convolutional neural network

Evaluating Faster-RCNN and YOLOv3 for Target Detection in Multi-sensor Data

Image Classification with A-MnasNet and R-MnasNet on NXP Bluebox 2.0

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOC 1312 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation