Skip to main content
Log in

Stage classification using two-stream deep convolutional neural networks

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Stage classification is a significant important task for scene understanding, 3D TV, autonomous vehicle, and object localization. Images can be categorized into a limited number of 3D scene geometries, called stages, and each one of them is having a unique depth pattern to provide a specific context for stage objects. Moreover, convolutional neural networks (CNN) have shown high performance of scene classification due to their powerful perspective of feature learning and reasoning. However, we found that edge-preserving Laplacian filter (LF) based on Laplacian pyramids, which enhances the edge details of image scene owing to this, it can improve the performance of stage classification. We introduce a novel method of stage classification based on two-stream CNN model in which one stream is encoded by LF, and another stream is normal RGB images and their output is fused at the decision level. This proposed method is evaluated on two different stage datasets: first ‘stage-1209’ contains 1209 images, and second, ‘12-scene’ image dataset contains 12,000 images. Results exhibited that LF encoded images have a positive influence on stage classification accuracy. Following this, while using product rule the proposed method obtains the most significant improvement in the stage classification for both datasets. It improves particularly 7.96% stage accuracy on 12-scene image dataset, compared to the state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. LF source code: https://people.csail.mit.edu/sparis/publi/2011/siggraph/.

References

  1. Nedovic, V., Smeulders, A.W., Redert, A., Geusebroek, J.M.: Stages as models of scene geometry. IEEE Trans Pattern Anal Mach Intell 32(9), 1673–1687 (2010). https://doi.org/10.1109/TPAMI.2009.174

    Article  Google Scholar 

  2. Yang, Y., Newsam, S.: Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery. In: 2008 15th IEEE international conference on image processing, pp. 1852–1855 (2008)

  3. Santos, J.A.D., Penatti, O.A.B., Torres, R.D.S.: Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. VISAPP (2010). https://doi.org/10.5220/0002843402030208

    Article  Google Scholar 

  4. Chen, C., Zhang, B., Su, H., Li, W., Wang, L.: Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4), 745–752 (2016). https://doi.org/10.1007/s11760-015-0804-2

    Article  Google Scholar 

  5. Li, H., Gu, H., Han, Y., Yang, J.: Object-oriented classification of high-resolution remote sensing imagery based on an improved colour structure code and a support vector machine. Int. J. Remote Sens. 31(6), 1453–1470 (2010). https://doi.org/10.1080/01431160903475266

    Article  Google Scholar 

  6. Luo, B., Jiang, S., Zhang, L.: Indexing of remote sensing images with different resolutions by multiple features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6(4), 1899–1912 (2013). https://doi.org/10.1109/JSTARS.2012.2228254

    Article  Google Scholar 

  7. Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: 2009 IEEE 12th international conference on computer vision, pp. 1849–1856 (2009)

  8. Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: 10th IEEE International Conference on Computer Vision (ICCV'05), vol. 1, pp. 654–661 (2005)

  9. Oliva, A., Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Amsterdam (2001). https://doi.org/10.1023/A:1011139631724

    Book  MATH  Google Scholar 

  10. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol. 1152, pp. 1150–1157 (1999)

  11. Ojala, T., Pietik, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002). https://doi.org/10.1109/tpami.2002.1017623

    Article  Google Scholar 

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 881, pp. 886–893 (2005)

  13. Lou, Z., Gevers, T., Hu, N.: Extracting 3D layout from a single image using global image structures. IEEE Trans. Image Process. 24(10), 3098–3108 (2015). https://doi.org/10.1109/TIP.2015.2431443

    Article  MathSciNet  MATH  Google Scholar 

  14. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594 (2015)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  16. Alex, K., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—Volume 1. Curran Associates Inc., Lake Tahoe, Nevada, pp 1097–1105 (2012)

  17. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. Paper presented at the proceedings of the 31st AAAI conference on artificial intelligence, San Francisco, California, USA, vol. 31, pp. 4278–4284 (2017)

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, http://arxiv.org/abs/1409.1556

  19. Liu, B., Liu, J., Wang, J., Lu, H.: Learning a representative and discriminative part model with deep convolutional features for scene recognition. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) Computer Vision—ACCV 2014, pp. 643–658. Springer International Publishing, Cham (2015)

    Chapter  Google Scholar 

  20. Deng, J., Dong, W., Socher, R., Li, L., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)

  21. Cheron, G., Laptev, I., Schmid, C.: P-CNN: Pose-based CNN features for action recognition. Paper presented at the proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 3218–3226, https://doi.org/10.1109/ICCV.2015.368 (2015)

  22. Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, Texas, pp. 457–468 (2016)

  23. Anwer, R.M., Khan, F.S., Weijer, J.V.D., Molinier, M., Laaksonen, J.: Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogr. Remote Sens. 138, 74–85 (2018). https://doi.org/10.1016/j.isprsjprs.2018.01.023

    Article  Google Scholar 

  24. Yu, Y., Liu, F.: A two-stream deep fusion framework for high-resolution aerial scene classification. Comput. Intell. Neurosci. 2018, 8639367 (2018). https://doi.org/10.1155/2018/8639367

    Article  Google Scholar 

  25. Paris, S., Hasinoff, S.W., Kautz, J.: Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Trans. Graph. 30(4), 1–12 (2011). https://doi.org/10.1145/2010324.1964963

    Article  Google Scholar 

  26. Mohandes, M., Deriche, M., Aliyu, S.O.: Classifiers combination techniques: a comprehensive review. IEEE Access 6, 19626–19639 (2018). https://doi.org/10.1109/ACCESS.2018.2813079

    Article  Google Scholar 

  27. Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, pp. 361–386. Springer, Berlin (2008)

    Chapter  Google Scholar 

  28. Geusebroek, J.-M., Smeulders, A.W.M.: A Six-Stimulus Theory for Stochastic Texture. Kluwer Academic Publishers, Amsterdam (2005). https://doi.org/10.1007/s11263-005-4632-7

    Book  Google Scholar 

  29. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). IEEE, New York, NY, USA, pp. 2169–2178 (2006)

  30. Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013). https://doi.org/10.1007/s11263-013-0636-x

    Article  MathSciNet  MATH  Google Scholar 

  31. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to Zoo. In: 2010 IEEE computer society conference on computer vision and pattern recognition, San Francisco, CA, USA . IEEE, pp. 3485–3492 (2010)

  32. Cun, Y.L., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., Jackel, L.D., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: David, S.T. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan Kaufmann Publishers Inc., Burlington (1990)

    Google Scholar 

  33. Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017). https://doi.org/10.1162/neco_a_00990

    Article  MathSciNet  MATH  Google Scholar 

  34. Tang, P., Wang, H., Kwong, S.: G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing 225, 188–197 (2017). https://doi.org/10.1016/j.neucom.2016.11.023

    Article  Google Scholar 

  35. Liu, S., Tian, G., Xu, Y.: A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter. Neurocomputing 338, 191–206 (2019)

    Article  Google Scholar 

  36. Khan, A., Chefranov, A., Demirel, H.: Texture gradient and deep features fusion-based image scene geometry identification system using extreme learning machine. In: 2020 3rd International Conference of Intelligent Robotic and Control Engineering (IRCE), University of Oxford, UK, pp. 37–41 (2020)

  37. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998). https://doi.org/10.1109/34.667881

    Article  Google Scholar 

  38. Snelick, R., Uludag, U., Mink, A., Indovina, M., Jain, A.: Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 450–455 (2005). https://doi.org/10.1109/TPAMI.2005.57

    Article  Google Scholar 

  39. Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: 2006 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 2137–2144 (2006)

  40. Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: 10th IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Beijing, China, pp. 1800–1807, vol. 1802 (2005)

  41. Gettyimages: gettyimages data https://www.gettyimages.com/photos/

  42. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018). https://doi.org/10.1109/TPAMI.2017.2723009

    Article  Google Scholar 

  43. Wang, C., Peng, G., De Baets, B.: Deep feature fusion through adaptive discriminative metric learning for scene recognition. Inf. Fusion 63, 1–12 (2020). https://doi.org/10.1016/j.inffus.2020.05.005

    Article  Google Scholar 

  44. Kim, S., Kavuri, S., Lee, M.: Deep network with support vector machines. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) Neural Information Processing, pp. 458–465. Springer, Berlin (2013)

    Chapter  Google Scholar 

  45. Patalas, M.: Halikowski: a model for generating workplace procedures using a CNN-SVM architecture. Symmetry 11(9), 1151 (2019). https://doi.org/10.3390/sym11091151

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Altaf Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 1312 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chefranov, A., Khan, A. & Demirel, H. Stage classification using two-stream deep convolutional neural networks. SIViP 16, 311–319 (2022). https://doi.org/10.1007/s11760-021-01911-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-01911-8

Keywords

Navigation