Recognition of varying size scene images using semantic analysis of deep activation maps

Gupta, Shikha; Dileep, A. D.; Thenkanidiyoor, Veena

doi:10.1007/s00138-021-01168-8

Recognition of varying size scene images using semantic analysis of deep activation maps

Original Paper
Published: 01 March 2021

Volume 32, article number 52, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Shikha Gupta¹,
A. D. Dileep¹ &
Veena Thenkanidiyoor²

273 Accesses
4 Citations
Explore all metrics

Abstract

Understanding the complex semantic structure of scene images requires mapping the image from pixel space to high-level semantic space. In semantic space, a scene image is represented by the posterior probabilities of concepts (e.g., ‘car,’ ‘chair,’ ‘window,’ etc.) present in it and such representation is known as semantic multinomial (SMN) representation. SMN generation requires a concept annotated dataset for concept modeling which is infeasible to generate manually due to the large size of databases. To tackle this issue, we propose a novel approach of building the concept model via pseudo-concepts. Pseudo-concept acts as a proxy for the actual concept and gives the cue for its presence instead of actual identity. We propose to use filter responses from deeper convolutional layers of convolutional neural networks (CNNs) as pseudo-concepts, as filters in deeper convolutional layers are trained for different semantic concepts. Most of the prior work considers fixed-size (\(\approx \)227\(\times \)227) images for semantic analysis which suppresses many concepts present in the images. In this work, we preserve the true-concept structure in images by passing in their original resolution to convolutional layers of CNNs. We further propose to prune the non-prominent pseudo-concepts, group the similar one using kernel clustering and later model them using a dynamic-based support vector machine. We demonstrate that resulting SMN representation indeed captures the semantic concepts better and results in state-of-the-art classification accuracy on varying size scene image datasets such as MIT67 and SUN397.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Semantic Multinomial Representation for Scene Images Using CNN-Based Pseudo-concepts and Concept Neural Network

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

Unsupervised Learning of Semantics of Object Detections for Scene Categorization

Notes

https://github.com/BVLC/caffe/wiki/Model-Zoo.

References

Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations, pp. 3319–3327 (2017)
Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC 2011), Dundee, Scotland, vol. 2, p. 8 (2011)
Cheng, X., Lu, J., Feng, J., Yuan, B., Zhou, J.: Scene recognition with objectness. Pattern Recogn. 74, 474–487 (2018)
Article Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proceedings of Workshop on Statistical Learning in Computer Vision (ECCV 2004), Prague, Czech Republic, vol. 1, pp. 1–2 (2004)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Florida, USA, pp. 248–255 (2009)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic Fisher vectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, Massachusetts, pp. 2974–2983, https://doi.org/10.1109/CVPR.2015.7298916 (2015)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: A deep convolutional activation feature for generic visual recognition. In: Proceedings of the International Conference on Machine Learning (ICML 2014), Beijing, China, pp. 647–655 (2014)
Fernando, B., Fromont, E., Tuytelaars, T.: Mining mid-level features for image classification. Int. J. Comput. Vis. 108(3), 186–203 (2014)
Article MathSciNet Google Scholar
Fong, R., Vedaldi, A.: Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks (2018) Preprint arXiv:1801.03454
Gao, B.B., Wei, X.S., Wu, J., Lin, W.: Deep spatial pyramid: the devil is once again in the details (2015). Preprint arXiv:1504.05277
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European Conference on Computer Vision (ECCV 2014), Zurich, pp. 392–407 (2014)
Gupta, S., Dileep, A.D., Thenkanidiyoor, V.: Segment-level pyramid match kernels for the classification of varying length patterns of speech using svms. In: Proceedings of the European Signal Processing Conference (EUSIPCO 2016), Budapest, Hungary, pp. 2030–2034 (2016)
Gupta, S., Dileep, A.D., Thenkanidiyoor, V.: The semantic multinomial representation of images obtained using dynamic kernel based pseudo-concept SVMs. In: Proceedings of National Conference on Communication (NCC 2017), Chennai, India, pp. 1–6 (2017)
Gupta, S., Dinesh, D.A., Thenkanidiyoor, V.: Deep cnn based pseudo-concept selection and modeling for generation of semantic multinomial representation of scene images. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 336–339 (2018a)
Gupta, S., Pradhan, D.K., Aroor, Dinesh D., Thenkanidiyoor, V.: Deep spatial pyramid match kernel for scene classification. In: International Conference on Pattern Recognition Applications and Methods ICPRAM, pp. 141–148 (2018b)
Gupta, S., Karanath, A., Mahrifa, K., Dileep, A.D., Thenkanidiyoor, V.: Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech. Int. J. Speech Technol. 22(1), 231–249 (2019)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016, Las Vegas, USA, pp. 770–778 (2016)
Henderson, J.: Introduction to real-world scene perception. Vis. Cogn. 12(6), 849–851 (2005)
Article Google Scholar
Herranz, L., Jiang, S., Li, X.: Scene recognition with CNNs: objects, scales and dataset bias. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, USA, pp. 571–579 (2016)
Jiang, S., Chen, G., Song, X., Liu, L.: Deep patch representations with shared codebook for scene classification. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 15(1s), 5 (2019)
Google Scholar
Khan, S.H., Hayat, M., Bennamoun, M., Togneri, R., Sohel, F.A.: A discriminative representation of convolutional features for indoor scene recognition. IEEE Trans. Image Process. 25(7), 3372–3383 (2016)
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Proceedings of Conference on Advances in Neural Information Processing Systems (NIPS 2012), Nevada, USA, pp. 1097–1105 (2012)
Li, L.J., Su, H., Lim, Y., Fei-Fei, L.: Object bank: an object-level image representation for high-level visual recognition. Int. J. Comput. Vis. 107(1), 20–39 (2014). https://doi.org/10.1007/s11263-013-0660-x
Article Google Scholar
Li, P., Samorodnitsk, G., Hopcroft, J.: Sign cauchy projections and chi-square kernel. In: Proceedings of Conference on Advances in Neural Information Processing Systems (NIPS 2013), Harrah’s Lake Tahoe, USA, pp. 2571–2579 (2013)
Li, Y., Liu, L., Shen, C., Van Den Hengel, A.: Mining mid-level visual patterns with deep cnn activations. Int. J. Comput. Vis. 121(3), 344–364 (2017)
Article MathSciNet Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: Label transfer via dense scene alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Florida, USA, pp. 1972–1979 (2009)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article Google Scholar
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Florida, USA, pp. 413–420 (2009)
Rasiwasia, N., Vasconcelos, N.: Holistic context models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 902–917 (2012)
Article Google Scholar
Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: query by semantic example. IEEE Trans. Multimed. 9(5), 923–938 (2007)
Article Google Scholar
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Article MathSciNet Google Scholar
Seong, H., Hyun, J., Kim, E.: Fosnet: an end-to-end trainable deep neural network for scene recognition. IEEE Access 8, 82066–82077 (2020)
Article Google Scholar
Sharma, K., Gupta, S., Dileep, A.D., Rameshan, R.: Scene image classification using reduced virtual feature representation in sparse framework. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2701–2705 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition.Preprint arXiv:1409.1556 (2014)
Sitaula, C., Xiang, Y., Zhang, Y., Lu, X., Aryal, S.: Indoor image representation by high-level semantic features. IEEE Access 7, 84967–84979 (2019)
Article Google Scholar
Song, X., Jiang, S., Herranz, L.: Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Trans. Image Process. 26(6), 2721–2735 (2017)
Article MathSciNet Google Scholar
Sun, N., Li, W., Liu, J., Han, G., Wu, C.: Fusing object semantics and deep appearance features for scene recognition. IEEE Trans. Circuits Syst. Video Technol. 29(6), 1715–1728 (2019)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, Massachusetts, pp. 1–9 (2015)
Tang, P., Wang, H., Kwong, S.: G-ms2f: Googlenet based multi-stage feature fusion of deep cnn for scene recognition. Neurocomputing 225, 188–197 (2017)
Article Google Scholar
Vogel, J., Schiele, B.: Natural scene retrieval based on a semantic modeling step. In: Proceedings of the International Conference on Image and Video Retrieval (CIVR 2004), Dublin, Ireland, pp. 207–215 (2004)
Wu, R., Wang, B., Wang, W., Yu, Y.: Harvesting discriminative meta objects with deep cnn features for scene classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, pp. 1287–1295 (2015)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), San Francisco, CA, pp. 3485–3492 (2010)
Xie, G.S., Zhang, X.Y., Yan, S., Liu, C.L.: Hybrid cnn and dictionary-based models for scene recognition and domain adaptation. IEEE Trans. Circuits Syst. Video Technol. 27(6), 1263–1274 (2015)
Article Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Florida, USA, pp. 1794–1801 (2009)
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. In: Proceedings of the Deep Learning Workshop in International Conference on Machine Learning (ICML 2015) (2015)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV 2014), Zurich, pp. 818–833 (2014)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings of Conference on Advances in Neural Information Processing Systems (NIPS 2014), Montreal, Canada, pp. 487–495 (2014)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, H.P., 175001, India
Shikha Gupta & A. D. Dileep
Department of Computer Science and Engineering, National Institute of Technology Goa, Ponda, Goa, 403401, India
Veena Thenkanidiyoor

Authors

Shikha Gupta
View author publications
You can also search for this author in PubMed Google Scholar
A. D. Dileep
View author publications
You can also search for this author in PubMed Google Scholar
Veena Thenkanidiyoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikha Gupta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, S., Dileep, A.D. & Thenkanidiyoor, V. Recognition of varying size scene images using semantic analysis of deep activation maps. Machine Vision and Applications 32, 52 (2021). https://doi.org/10.1007/s00138-021-01168-8

Download citation

Received: 18 April 2020
Revised: 21 September 2020
Accepted: 12 January 2021
Published: 01 March 2021
DOI: https://doi.org/10.1007/s00138-021-01168-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Recognition of varying size scene images using semantic analysis of deep activation maps

Abstract

Access this article

Similar content being viewed by others

Semantic Multinomial Representation for Scene Images Using CNN-Based Pseudo-concepts and Concept Neural Network

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

Unsupervised Learning of Semantics of Object Detections for Scene Categorization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognition of varying size scene images using semantic analysis of deep activation maps

Abstract

Access this article

Similar content being viewed by others

Semantic Multinomial Representation for Scene Images Using CNN-Based Pseudo-concepts and Concept Neural Network

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

Unsupervised Learning of Semantics of Object Detections for Scene Categorization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation