Abstract
Stage classification is a significant important task for scene understanding, 3D TV, autonomous vehicle, and object localization. Images can be categorized into a limited number of 3D scene geometries, called stages, and each one of them is having a unique depth pattern to provide a specific context for stage objects. Moreover, convolutional neural networks (CNN) have shown high performance of scene classification due to their powerful perspective of feature learning and reasoning. However, we found that edge-preserving Laplacian filter (LF) based on Laplacian pyramids, which enhances the edge details of image scene owing to this, it can improve the performance of stage classification. We introduce a novel method of stage classification based on two-stream CNN model in which one stream is encoded by LF, and another stream is normal RGB images and their output is fused at the decision level. This proposed method is evaluated on two different stage datasets: first ‘stage-1209’ contains 1209 images, and second, ‘12-scene’ image dataset contains 12,000 images. Results exhibited that LF encoded images have a positive influence on stage classification accuracy. Following this, while using product rule the proposed method obtains the most significant improvement in the stage classification for both datasets. It improves particularly 7.96% stage accuracy on 12-scene image dataset, compared to the state-of-the-art method.
Similar content being viewed by others
Notes
LF source code: https://people.csail.mit.edu/sparis/publi/2011/siggraph/.
References
Nedovic, V., Smeulders, A.W., Redert, A., Geusebroek, J.M.: Stages as models of scene geometry. IEEE Trans Pattern Anal Mach Intell 32(9), 1673–1687 (2010). https://doi.org/10.1109/TPAMI.2009.174
Yang, Y., Newsam, S.: Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery. In: 2008 15th IEEE international conference on image processing, pp. 1852–1855 (2008)
Santos, J.A.D., Penatti, O.A.B., Torres, R.D.S.: Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. VISAPP (2010). https://doi.org/10.5220/0002843402030208
Chen, C., Zhang, B., Su, H., Li, W., Wang, L.: Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4), 745–752 (2016). https://doi.org/10.1007/s11760-015-0804-2
Li, H., Gu, H., Han, Y., Yang, J.: Object-oriented classification of high-resolution remote sensing imagery based on an improved colour structure code and a support vector machine. Int. J. Remote Sens. 31(6), 1453–1470 (2010). https://doi.org/10.1080/01431160903475266
Luo, B., Jiang, S., Zhang, L.: Indexing of remote sensing images with different resolutions by multiple features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6(4), 1899–1912 (2013). https://doi.org/10.1109/JSTARS.2012.2228254
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: 2009 IEEE 12th international conference on computer vision, pp. 1849–1856 (2009)
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: 10th IEEE International Conference on Computer Vision (ICCV'05), vol. 1, pp. 654–661 (2005)
Oliva, A., Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Amsterdam (2001). https://doi.org/10.1023/A:1011139631724
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol. 1152, pp. 1150–1157 (1999)
Ojala, T., Pietik, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002). https://doi.org/10.1109/tpami.2002.1017623
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 881, pp. 886–893 (2005)
Lou, Z., Gevers, T., Hu, N.: Extracting 3D layout from a single image using global image structures. IEEE Trans. Image Process. 24(10), 3098–3108 (2015). https://doi.org/10.1109/TIP.2015.2431443
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Alex, K., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—Volume 1. Curran Associates Inc., Lake Tahoe, Nevada, pp 1097–1105 (2012)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. Paper presented at the proceedings of the 31st AAAI conference on artificial intelligence, San Francisco, California, USA, vol. 31, pp. 4278–4284 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, http://arxiv.org/abs/1409.1556
Liu, B., Liu, J., Wang, J., Lu, H.: Learning a representative and discriminative part model with deep convolutional features for scene recognition. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) Computer Vision—ACCV 2014, pp. 643–658. Springer International Publishing, Cham (2015)
Deng, J., Dong, W., Socher, R., Li, L., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Cheron, G., Laptev, I., Schmid, C.: P-CNN: Pose-based CNN features for action recognition. Paper presented at the proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 3218–3226, https://doi.org/10.1109/ICCV.2015.368 (2015)
Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, Texas, pp. 457–468 (2016)
Anwer, R.M., Khan, F.S., Weijer, J.V.D., Molinier, M., Laaksonen, J.: Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogr. Remote Sens. 138, 74–85 (2018). https://doi.org/10.1016/j.isprsjprs.2018.01.023
Yu, Y., Liu, F.: A two-stream deep fusion framework for high-resolution aerial scene classification. Comput. Intell. Neurosci. 2018, 8639367 (2018). https://doi.org/10.1155/2018/8639367
Paris, S., Hasinoff, S.W., Kautz, J.: Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Trans. Graph. 30(4), 1–12 (2011). https://doi.org/10.1145/2010324.1964963
Mohandes, M., Deriche, M., Aliyu, S.O.: Classifiers combination techniques: a comprehensive review. IEEE Access 6, 19626–19639 (2018). https://doi.org/10.1109/ACCESS.2018.2813079
Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, pp. 361–386. Springer, Berlin (2008)
Geusebroek, J.-M., Smeulders, A.W.M.: A Six-Stimulus Theory for Stochastic Texture. Kluwer Academic Publishers, Amsterdam (2005). https://doi.org/10.1007/s11263-005-4632-7
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). IEEE, New York, NY, USA, pp. 2169–2178 (2006)
Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013). https://doi.org/10.1007/s11263-013-0636-x
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to Zoo. In: 2010 IEEE computer society conference on computer vision and pattern recognition, San Francisco, CA, USA . IEEE, pp. 3485–3492 (2010)
Cun, Y.L., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., Jackel, L.D., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: David, S.T. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan Kaufmann Publishers Inc., Burlington (1990)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017). https://doi.org/10.1162/neco_a_00990
Tang, P., Wang, H., Kwong, S.: G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing 225, 188–197 (2017). https://doi.org/10.1016/j.neucom.2016.11.023
Liu, S., Tian, G., Xu, Y.: A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter. Neurocomputing 338, 191–206 (2019)
Khan, A., Chefranov, A., Demirel, H.: Texture gradient and deep features fusion-based image scene geometry identification system using extreme learning machine. In: 2020 3rd International Conference of Intelligent Robotic and Control Engineering (IRCE), University of Oxford, UK, pp. 37–41 (2020)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998). https://doi.org/10.1109/34.667881
Snelick, R., Uludag, U., Mink, A., Indovina, M., Jain, A.: Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 450–455 (2005). https://doi.org/10.1109/TPAMI.2005.57
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: 2006 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 2137–2144 (2006)
Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: 10th IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Beijing, China, pp. 1800–1807, vol. 1802 (2005)
Gettyimages: gettyimages data https://www.gettyimages.com/photos/
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018). https://doi.org/10.1109/TPAMI.2017.2723009
Wang, C., Peng, G., De Baets, B.: Deep feature fusion through adaptive discriminative metric learning for scene recognition. Inf. Fusion 63, 1–12 (2020). https://doi.org/10.1016/j.inffus.2020.05.005
Kim, S., Kavuri, S., Lee, M.: Deep network with support vector machines. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) Neural Information Processing, pp. 458–465. Springer, Berlin (2013)
Patalas, M.: Halikowski: a model for generating workplace procedures using a CNN-SVM architecture. Symmetry 11(9), 1151 (2019). https://doi.org/10.3390/sym11091151
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chefranov, A., Khan, A. & Demirel, H. Stage classification using two-stream deep convolutional neural networks. SIViP 16, 311–319 (2022). https://doi.org/10.1007/s11760-021-01911-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01911-8