Skip to main content
Log in

Architectural style classification based on CNN and channel–spatial attention

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The accurate classification of architectural styles is of great significance to the study of architectural culture and human historical civilization. Models based on convolutional neural network (CNN) have achieved highly competitive results in the field of architectural style classification owing to its more powerful capability of feature expression. However, most of the CNN models to date only extract the global features of architecture facade or focus on some regions of architecture and fail to extract the spatial features of different components. To improve the accuracy of architectural style classification, we propose an architectural style classification method based on CNN and channel–spatial attention. Firstly, we add a preprocessing operation before CNN feature extraction to select main building candidate region in architectural image and then use CNN feature extractor for deep feature extraction. Secondly, channel–spatial attention module is introduced to generate an attention map, which can not only enhance the texture feature representation of architectural images but also focus on the spatial features of different architectural elements. Finally, the Softmax classifier is used to predict the score of the target class. The experimental results on the Architectural Style Dataset and AHE_Dataset have achieved satisfactory performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://sites.google.com/site/zhexuutssjtu/projects/arch.

References

  1. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://doi.org/10.1109/CVPR.2018.00636

  2. Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Huang, Y., Wang, L., Huang, C., Xu, W., Ramanan, D., Huang, T.S.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

  3. Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  5. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)

    Article  MATH  Google Scholar 

  6. Gong, L., Thota, M., Yu, M., Duan, W., Swainson, M., Ye, X., Kollias, S.: A novel unified deep neural networks methodology for use by date recognition in retail food package image. SIViP 15(3), 449–457 (2021)

    Article  Google Scholar 

  7. Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  9. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size (2016). arXiv:1602.07360

  10. Jiang, S., Shao, M., Jia, C., Fu, Y.: Learning consensus representation for weak style classification. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2906–2919 (2017)

    Article  Google Scholar 

  11. Lamas, A., Tabik, S., Cruz, P., Montes, R., Martínez-Sevilla, Á., Cruz, T., Herrera, F.: Monumai: dataset, deep learning pipeline and citizen science based app for monumental heritage taxonomy and classification. Neurocomputing 420, 266–280 (2021). https://doi.org/10.1016/j.neucom.2020.09.041

    Article  Google Scholar 

  12. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)

  13. Li, Lj., Su, H., Fei-fei, L., Xing, E.: Object bank: a high-level image representation for scene classification & semantic feature sparsification. Adv. Neural Inf. Process. Syst. 23, 1378–1386 (2010)

    Google Scholar 

  14. Llamas, J., M Lerones, P., Medina, R., Zalama, E., Gómez-García-Bermejo, J.: Classification of architectural heritage images using deep learning techniques. Appl. Sci. 7(10), 992 (2017)

    Article  Google Scholar 

  15. Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://doi.org/10.1109/CVPR.2017.232

  16. Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: 2011 International Conference on Computer Vision, pp. 1307–1314 (2011)

  17. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)

  18. Shalunts, G., Haxhimusa, Y., Sablatnig, R.: Architectural style classification of building facade windows. In: International Symposium on Visual Computing, pp. 280–289. Springer (2011)

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014). arXiv:1409.1556

  20. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence (2017). arXiv:1602.07261

  21. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  22. Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: Computer Vision, IEEE International Conference on, vol. 2, pp. 273–273. IEEE Computer Society (2003)

  23. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  24. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://doi.org/10.1109/CVPR.2017.683

  25. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018). arXiv:1807.06521

  26. Xin, M., Wang, Y.: Research on image classification model based on deep convolution neural network. EURASIP J. Image Video Process. 2019(1), 1–11 (2019)

    Article  Google Scholar 

  27. Xu, Z., Tao, D., Zhang, Y., Wu, J., Tsoi, A.C.: Architectural style classification using multinomial latent logistic regression. In: European Conference on Computer Vision, pp. 600–615. Springer (2014). https://doi.org/10.1007/978-3-319-10590-1_39

  28. Yi, Y.K., Zhang, Y., Myung, J.: House style recognition using deep convolutional neural network. Autom. Constr. 118, 103307 (2020). https://doi.org/10.1016/j.autcon.2020.103307

    Article  Google Scholar 

  29. Zhang, J., Wei, F., Feng, F., Wang, C.: Spatial-spectral feature refinement for hyperspectral image classification based on attention-dense 3D–2D-CNN. Sensors 20(18), 5191 (2020). https://doi.org/10.3390/s20185191

  30. Zhang, L., Song, M., Liu, X., Sun, L., Chen, C., Bu, J.: Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf. Sci. 254, 141–154 (2014). https://doi.org/10.1016/j.ins.2013.08.020

    Article  Google Scholar 

  31. Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Shanxi Province, China (Grant No. 202103021224285) and the Key Scientific and Technological Innovation Team of Shanxi Province, China (20180-5D131007), for big data analysis and parallel computing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sulan Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Zhang, S., Zhang, J. et al. Architectural style classification based on CNN and channel–spatial attention. SIViP 17, 99–107 (2023). https://doi.org/10.1007/s11760-022-02208-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02208-0

Keywords

Navigation