Skip to main content
Log in

Multi-Level Ensemble Network for Scene Recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scene recognition is an important branch of computer vision and a common task for deep learning. As is known to all, different scenes are supported by different “key objects”. Therefore, the neural network used for the scene recognition task needs to extract the features of these key objects in the scene, sometimes even has to integrate the positional relation between objects to determine the class to which the scene belongs. Under some circumstances, key objects in the scenes are very small and the features of them become extremely inconspicuous or even disappear in the deep layers of the network. Such kind of phenomenon is called “small object-supported scenes”. In this paper, Multi-Level Ensemble Network (MLEN), a convolutional neural network, has been proposed, to improve the recognition accuracy of these “small object-supported scenes”. Features from multiple levels of the net are used to make separate predictions. Then ensemble learning is performed within the net to make the final prediction. Apart from all this, “Feature Transfer Path” is added and feature fusion methods are adopted to make full use of low-level and high-level features. Moreover, a class-weight loss function for the problem of non-uniform class distribution has been designed. This function can help further improve accuracy in most scene recognition datasets. The experiments involve the Urban Management Case (UMC) dataset collated from two smart urban management system databases by ourselves, and the Places-mini dataset, which is a subset of the well-known Places dataset [36]. The results show that our Multi-Level Ensemble Network achieves much higher accuracy than the state-of-the-art scene recognition networks on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Bertinetto L, Valmadre J, Henriques JF, et al. (2016) Fully-Convolutional Siamese Networks for Object Tracking[C]// European Conference on Computer Vision. Springer International Publishing, 850–865

  2. Chen Y, Li J, Xiao H, et al. (2017) Dual Path Networks[J]

  3. Cheng Z, Shen J (2016) On very large scale test collection for landmark image search benchmarking[J]. Signal Processing, 124:13–26

  4. Cheng Z, Chang X, et al. (2018) MMALFM: Explainable Recommendation by Leveraging Reviews and Images[J]. ACM Transactions on Information Systems

  5. Danelljan M, Bhat G, Khan FS, et al. (2016) ECO: Efficient Convolution Operators for Tracking[J]. 6931–6939

  6. Ding G, Chen W et al (2018) Real-Time Scalable Visual Tracking via Quadrangle Kernelized Correlation Filters[J]. IEEE Trans Intell Transp Syst 19(1):140–150

    Article  Google Scholar 

  7. Fan H, SANet LH (2017) Structure-Aware Network for Visual Tracking[C]// Computer Vision and Pattern Recognition Workshops. IEEE, 2217–2224

  8. George M, Dixit M, Zogg G, et al. (2016) Semantic Clustering for Robust Fine-Grained Scene Recognition[M]// Computer Vision – ECCV 2016. Springer International Publishing, 783–798.

  9. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks[J]. J Mach Learn Res 9:249–256

    Google Scholar 

  10. Hariharan B, Arbeláez P, Girshick R et al (2014) Simultaneous Detection and Segmentation[C]// European Conference on Computer Vision. Springer, Cham, pp 297–312

    Google Scholar 

  11. He K, Zhang X, Ren S, et al. (2016) Deep Residual Learning for Image Recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, :770–778.

  12. He K, Gkioxari G, Dollar P et al. (2017) Mask R-CNN[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP(99):1–1

  13. Herranz L, Jiang S, Li X (2016) Scene Recognition with CNNs: Objects, Scales and Dataset Bias[C]// Computer Vision and Pattern Recognition. IEEE, 571–579

  14. Hu J, Shen L, Sun G (2017) Squeeze-and-Excitation Networks[J]

  15. Huang G, Liu Z, Laurens VDM, et al. (2016) Densely Connected Convolutional Networks[J]. 2261–2269.

  16. Ioffe S, Szegedy C (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[J]. 448–456

  17. Jia Y, Shelhamer E, Donahue J et al. (2014) Caffe: Convolutional Architecture for Fast Feature Embedding[J].

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc. 1097–1105.

  19. Lécun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition[J]. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  20. Li Y, Qi H, Dai J, et al. (2016) Fully Convolutional Instance-aware Semantic Segmentation[J]. 4438–4446

  21. Romera-Paredes B, Torr PHS (2016) Recurrent Instance Segmentation[C]// European Conference on Computer Vision. Springer International Publishing, 312–329

  22. Shen L, Lin Z, Huang Q (2016) Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks[C]// European Conference on Computer Vision. Springer International Publishing, 467–482

  23. Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–9

  24. Szegedy C, Vanhoucke V, Ioffe S, et al. (2016) Rethinking the Inception Architecture for Computer Vision[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2818–2826

  25. Szegedy C, Ioffe S, Vanhoucke V, et al. (2016) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning[J]

  26. Wang L, Ouyang W, Wang X et al. (2016) Visual Tracking with Fully Convolutional Networks[C]// IEEE International Conference on Computer Vision. IEEE, 3119–3127

  27. Xie S, Girshick R, Dollar P, et al. (2017) Aggregated Residual Transformations for Deep Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 5987–5995

  28. Yan C, Tu Y, Wang X, et al. (2019) STAT: Spatial-Temporal Attention Mechanism for Video Captioning, IEEE Transactions on Multimedia

  29. Yan C, Li L, Zhang C, et al. (2019) Cross-modality Bridging and Knowledge Transferring for Image Understanding, IEEE Transactions on Multimedia

  30. Zagoruyko S, Komodakis N (2016) Wide Residual Networks[J]

  31. Zeiler MD, Fergus R (2014) Visualizing and Understanding Convolutional Networks[J]. 8689:818–833

  32. Zhao S, Yao H et al. (2016) Continuous Probability Distribution Prediction of Image Emotions via Multi-Task Shared Sparse Regression[J]. IEEE Transactions on Multimedia, PP(99):1–1

  33. Zhao S, Yao H, et al. (2016) Predicting Personalized Image Emotion Perceptions in Social Networks[J]. IEEE Transactions on Affective Computing, 1–1

  34. Zhao S, Gao Y, et al. (2017) Real-Time Multimedia Social Event Detection in Microblog[J]. IEEE Transactions on Cybernetics, 1–14

  35. Zhou B, Lapedriza A, Xiao J, et al. (2014) Learning deep features for scene recognition using places database[C]// International Conference on Neural Information Processing Systems. MIT Press, 487–495

  36. Zhou B, Lapedriza A, Khosla A, et al. (2018) Places: A 10 million Image Database for Scene Recognition.[J]. IEEE Trans Pattern Anal Mach Intell, PP(99):1–1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huihua Yang.

Ethics declarations

Conflict of Interest

The authors declared that they have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Li, L., Pan, X. et al. Multi-Level Ensemble Network for Scene Recognition. Multimed Tools Appl 78, 28209–28230 (2019). https://doi.org/10.1007/s11042-019-07933-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07933-2

Keywords