Multi-Level Ensemble Network for Scene Recognition

Zhang, Longhao; Li, Lingqiao; Pan, Xipeng; Cao, Zhiwei; Chen, Qianyu; Yang, Huihua

doi:10.1007/s11042-019-07933-2

Multi-Level Ensemble Network for Scene Recognition

Published: 08 July 2019

Volume 78, pages 28209–28230, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Longhao Zhang¹,
Lingqiao Li¹,
Xipeng Pan¹,
Zhiwei Cao¹,
Qianyu Chen¹ &
…
Huihua Yang ORCID: orcid.org/0000-0001-6334-4044¹

425 Accesses
Explore all metrics

Abstract

Scene recognition is an important branch of computer vision and a common task for deep learning. As is known to all, different scenes are supported by different “key objects”. Therefore, the neural network used for the scene recognition task needs to extract the features of these key objects in the scene, sometimes even has to integrate the positional relation between objects to determine the class to which the scene belongs. Under some circumstances, key objects in the scenes are very small and the features of them become extremely inconspicuous or even disappear in the deep layers of the network. Such kind of phenomenon is called “small object-supported scenes”. In this paper, Multi-Level Ensemble Network (MLEN), a convolutional neural network, has been proposed, to improve the recognition accuracy of these “small object-supported scenes”. Features from multiple levels of the net are used to make separate predictions. Then ensemble learning is performed within the net to make the final prediction. Apart from all this, “Feature Transfer Path” is added and feature fusion methods are adopted to make full use of low-level and high-level features. Moreover, a class-weight loss function for the problem of non-uniform class distribution has been designed. This function can help further improve accuracy in most scene recognition datasets. The experiments involve the Urban Management Case (UMC) dataset collated from two smart urban management system databases by ourselves, and the Places-mini dataset, which is a subset of the well-known Places dataset [36]. The results show that our Multi-Level Ensemble Network achieves much higher accuracy than the state-of-the-art scene recognition networks on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Article 30 January 2023

Scene representation using a new two-branch neural network model

Article 01 December 2023

RETRACTED ARTICLE: Improved transfer learning of CNN through fine-tuning and classifier ensemble for scene classification

Article 23 April 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bertinetto L, Valmadre J, Henriques JF, et al. (2016) Fully-Convolutional Siamese Networks for Object Tracking[C]// European Conference on Computer Vision. Springer International Publishing, 850–865
Chen Y, Li J, Xiao H, et al. (2017) Dual Path Networks[J]
Cheng Z, Shen J (2016) On very large scale test collection for landmark image search benchmarking[J]. Signal Processing, 124:13–26
Cheng Z, Chang X, et al. (2018) MMALFM: Explainable Recommendation by Leveraging Reviews and Images[J]. ACM Transactions on Information Systems
Danelljan M, Bhat G, Khan FS, et al. (2016) ECO: Efficient Convolution Operators for Tracking[J]. 6931–6939
Ding G, Chen W et al (2018) Real-Time Scalable Visual Tracking via Quadrangle Kernelized Correlation Filters[J]. IEEE Trans Intell Transp Syst 19(1):140–150
Article Google Scholar
Fan H, SANet LH (2017) Structure-Aware Network for Visual Tracking[C]// Computer Vision and Pattern Recognition Workshops. IEEE, 2217–2224
George M, Dixit M, Zogg G, et al. (2016) Semantic Clustering for Robust Fine-Grained Scene Recognition[M]// Computer Vision – ECCV 2016. Springer International Publishing, 783–798.
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks[J]. J Mach Learn Res 9:249–256
Google Scholar
Hariharan B, Arbeláez P, Girshick R et al (2014) Simultaneous Detection and Segmentation[C]// European Conference on Computer Vision. Springer, Cham, pp 297–312
Google Scholar
He K, Zhang X, Ren S, et al. (2016) Deep Residual Learning for Image Recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, :770–778.
He K, Gkioxari G, Dollar P et al. (2017) Mask R-CNN[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP(99):1–1
Herranz L, Jiang S, Li X (2016) Scene Recognition with CNNs: Objects, Scales and Dataset Bias[C]// Computer Vision and Pattern Recognition. IEEE, 571–579
Hu J, Shen L, Sun G (2017) Squeeze-and-Excitation Networks[J]
Huang G, Liu Z, Laurens VDM, et al. (2016) Densely Connected Convolutional Networks[J]. 2261–2269.
Ioffe S, Szegedy C (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[J]. 448–456
Jia Y, Shelhamer E, Donahue J et al. (2014) Caffe: Convolutional Architecture for Fast Feature Embedding[J].
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 1097–1105.
Lécun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition[J]. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Y, Qi H, Dai J, et al. (2016) Fully Convolutional Instance-aware Semantic Segmentation[J]. 4438–4446
Romera-Paredes B, Torr PHS (2016) Recurrent Instance Segmentation[C]// European Conference on Computer Vision. Springer International Publishing, 312–329
Shen L, Lin Z, Huang Q (2016) Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks[C]// European Conference on Computer Vision. Springer International Publishing, 467–482
Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–9
Szegedy C, Vanhoucke V, Ioffe S, et al. (2016) Rethinking the Inception Architecture for Computer Vision[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, et al. (2016) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning[J]
Wang L, Ouyang W, Wang X et al. (2016) Visual Tracking with Fully Convolutional Networks[C]// IEEE International Conference on Computer Vision. IEEE, 3119–3127
Xie S, Girshick R, Dollar P, et al. (2017) Aggregated Residual Transformations for Deep Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 5987–5995
Yan C, Tu Y, Wang X, et al. (2019) STAT: Spatial-Temporal Attention Mechanism for Video Captioning, IEEE Transactions on Multimedia
Yan C, Li L, Zhang C, et al. (2019) Cross-modality Bridging and Knowledge Transferring for Image Understanding, IEEE Transactions on Multimedia
Zagoruyko S, Komodakis N (2016) Wide Residual Networks[J]
Zeiler MD, Fergus R (2014) Visualizing and Understanding Convolutional Networks[J]. 8689:818–833
Zhao S, Yao H et al. (2016) Continuous Probability Distribution Prediction of Image Emotions via Multi-Task Shared Sparse Regression[J]. IEEE Transactions on Multimedia, PP(99):1–1
Zhao S, Yao H, et al. (2016) Predicting Personalized Image Emotion Perceptions in Social Networks[J]. IEEE Transactions on Affective Computing, 1–1
Zhao S, Gao Y, et al. (2017) Real-Time Multimedia Social Event Detection in Microblog[J]. IEEE Transactions on Cybernetics, 1–14
Zhou B, Lapedriza A, Xiao J, et al. (2014) Learning deep features for scene recognition using places database[C]// International Conference on Neural Information Processing Systems. MIT Press, 487–495
Zhou B, Lapedriza A, Khosla A, et al. (2018) Places: A 10 million Image Database for Scene Recognition.[J]. IEEE Trans Pattern Anal Mach Intell, PP(99):1–1

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing Shi, China
Longhao Zhang, Lingqiao Li, Xipeng Pan, Zhiwei Cao, Qianyu Chen & Huihua Yang

Authors

Longhao Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Lingqiao Li
View author publications
You can also search for this author inPubMed Google Scholar
Xipeng Pan
View author publications
You can also search for this author inPubMed Google Scholar
Zhiwei Cao
View author publications
You can also search for this author inPubMed Google Scholar
Qianyu Chen
View author publications
You can also search for this author inPubMed Google Scholar
Huihua Yang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Huihua Yang.

Ethics declarations

Conflict of Interest

The authors declared that they have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Li, L., Pan, X. et al. Multi-Level Ensemble Network for Scene Recognition. Multimed Tools Appl 78, 28209–28230 (2019). https://doi.org/10.1007/s11042-019-07933-2

Download citation

Received: 25 October 2018
Revised: 16 June 2019
Accepted: 24 June 2019
Published: 08 July 2019
Issue Date: 15 October 2019
DOI: https://doi.org/10.1007/s11042-019-07933-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Level Ensemble Network for Scene Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Scene representation using a new two-branch neural network model

RETRACTED ARTICLE: Improved transfer learning of CNN through fine-tuning and classifier ensemble for scene classification

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now