Skip to main content
Log in

A multi-scale inputs and labels model for background subtraction

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Background subtraction is a challenging and fundamental task in computer vision, which aims at segmenting moving objects from the background. Recently, the attention mechanism has become a hot topic in the neural network. The algorithms based on encoder-decoder and multi-scale type network perform impressive results in the domain of background subtraction. In this paper, we propose a multi-scale inputs and labels (MSIL) model which is based on the encoder-decoder type network and the channel attention. The multi-scale fusion encoding (MSFE) module aims to utilize multi-scale inputs effectively, which can fuse the high-level and low-level features details. The channel attention (CA) module is introduced to connect the encoder and decoder to model channel-wise attentions. The multi-label supervision decoding (MLSD) module helps to learn richer hierarchical features and achieves better performance by the new multi-label supervision. The proposed model is also evaluated on the CDnet-2014 dataset and the LASIESTA dataset, which demonstrate the effectiveness and superiority of the proposed model by an average F-Measure of 0.9851 and 0.9633, respectively. In addition, scene independent evaluation experiments on the CDnet-2014 dataset demonstrate the effectiveness of the model on unseen videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this article.

References

  1. Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFS. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 669–677 (2016)

  2. Sen-Ching, S.C., Kamath, C.: Robust techniques for background subtraction in urban traffic video. In: Visual Communications and Image Processing 2004. International Society for Optics and Photonics, vol. 5308, pp. 881-892 (2004)

  3. Wu, H., Liu, N., Luo, X., et al.: Real-time background subtraction-based video surveillance of people by integrating local texture patterns. SIViP 8(4), 665–676 (2014)

    Article  Google Scholar 

  4. Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. Acm Comput. Surveys CSUR 38(4), 13-es (2006)

    Article  Google Scholar 

  5. Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2013)

    Google Scholar 

  6. Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). IEEE, vol. 2 pp. 246–252 (1999)

  7. Barnich, O., Van Droogenbroeck, M.: ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Proc. 20(6), 1709–1724 (2011). https://doi.org/10.1109/TIP.2010.2101613

    Article  MathSciNet  MATH  Google Scholar 

  8. Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, (2014)

  9. Zhang, H., Patel, V.M.: Density-aware single image de-raining using a multi-stream dense network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 695–704 (2018)

  10. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, pp. 354–370. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22

    Chapter  Google Scholar 

  11. Zeng, D., Zhu, M.: Multiscale fully convolutional network for foreground object detection in infrared videos. IEEE Geosci. Remote Sens. Lett. 15(4), 617–621 (2018)

    Article  Google Scholar 

  12. Wang, Y., Luo, Z., Jodoin, P.M.: Interactive deep learning method for segmenting moving objects. Pattern Recogn. Lett. 96, 66–75 (2017)

    Article  Google Scholar 

  13. Tezcan O, Ishwar P, Konrad J. BSUV-Net: A fully-convolutional neural network for background subtraction of unseen videos[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 2774–2783.

  14. Mandal, M., Dhar, V., Mishra, A., et al.: 3DCD: Scene independent end-to-end spatiotemporal feature learning framework for change detection in unseen videos. IEEE Trans. Image Process. 30, 546–558 (2020)

    Article  Google Scholar 

  15. Sobral, A., Bouwmans, T.: BGS library: A library framework for algorithms evaluation in foreground/background segmentation. In: Background Modeling and Foreground Detection for Video Surveillance, Boca Raton, FL, USA: CRC Press, ch. 23, pp. 1–16 (2014)

  16. Mandal, M., Vipparthi, S.K.: An empirical review of deep learning frameworks for change detection: model design, experimental frameworks, challenges and research needs. IEEE Trans. Intell. Transp. Syst. 1–22 (2021)

  17. Wren, C.: Real-time tracking of the human body. Photonics East, SPIE, 2615 (1995)

  18. Ahn, H., Kang, M.: Dynamic background subtraction with masked RPCA. SIViP 15(3), 467–474 (2021)

    Article  Google Scholar 

  19. Braham, M., Van Droogenbroeck, M.: Deep background subtraction with scene-specific convolutional neural networks. In: 2016 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, pp. 1–4 (2016)

  20. Nguyen, T.P., Pham, C.C., Ha, S.V.U., et al.: Change detection by training a triplet network for motion feature extraction. IEEE Trans. Circuits Syst. Video Technol. 29(2), 433–446 (2018)

    Article  Google Scholar 

  21. Chen, Y., Wang, J., Zhu, B., et al.: Pixelwise deep sequence learning for moving object detection. IEEE Trans. Circuits Syst. Video Technol. 29(9), 2567–2579 (2017)

    Article  Google Scholar 

  22. Patil, P.W., Murala, S.: Msfgnet: a novel compact end-to-end deep network for moving object detection. IEEE Trans. Intell. Transp. Syst. 20(11), 4066–4077 (2018)

    Article  Google Scholar 

  23. Minematsu, T., Shimada, A., Taniguchi, R.: Rethinking background and foreground in deep neural network-based background subtraction. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3229–3233 (2020)

  24. Giraldo, J.H., Bouwmans, T.: Semi-supervised background subtraction of unseen videos: minimization of the total variation of graph signals. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3224–3228 (2020)

  25. Giraldo, J.H., Javed, S., Sultana, M., Jung, S.K., Bouwmans, T.: The emerging field of graph signal processing for moving object segmentation. In: Jeong, H., Sumi, K. (eds.) Frontiers of Computer Vision: 27th International Workshop, IW-FCV 2021, Daegu, South Korea, February 22–23, 2021, Revised Selected Papers, pp. 31–45. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-81638-4_3

    Chapter  Google Scholar 

  26. Giraldo, J.H., Javed, S., Werghi, N., et al.: Graph CNN for moving object detection in complex environments from unseen videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 225–233 (2021)

  27. Zhang, J., Zhang, X., Zhang, Y., et al.: Meta-knowledge learning and domain adaptation for unseen background subtraction. IEEE Trans. Image Process. 30, 9058–9068 (2021)

    Article  Google Scholar 

  28. Hou, B., Liu, Y., Ling, N., et al.: A fast lightweight 3D separable convolutional neural network with multi-input multi-output for moving object detection. IEEE Access 9, 148433–148448 (2021)

    Article  Google Scholar 

  29. Huini, F., Ma, Z., Zhao, B., Yang, Z., Jiang, Y., Zhu, M.: Lightweight convolutional neural network for foreground segmentation. In: Jia, Y., Zhang, W., Yongling, F., Zhiyuan, Y., Zheng, S. (eds.) Proceedings of 2021 Chinese Intelligent Systems Conference: Volume I, pp. 811–819. Springer Singapore, Singapore (2022). https://doi.org/10.1007/978-981-16-6328-4_81

    Chapter  Google Scholar 

  30. Hou, B., Liu, Y., Ling, N.: A super-fast deep network for moving object detection. In: International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2020)

  31. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7132–7141 (2018)

  32. Wang, Y., Jodoin, P.M., Porikli, F., et al.: CDnet 2014: An expanded change detection benchmark dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 387–394 (2014)

  33. Cuevas, C., Yáñez, E.M., García, N.: Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA. Comput. Vis. Image Underst. 152, 103–117 (2016)

    Article  Google Scholar 

  34. St-Charles, P.L., Bilodeau, G.A., Bergevin, R.: SuBSENSE: a universal change detection method with local adaptive sensitivity. IEEE Trans. Image Proc. 24, 359–373 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Bianco, S., Ciocca, G., Schettini, R.: Combination of video change detection algorithms by genetic programming. IEEE Trans. Evol. Comput. 21(6), 914–928 (2017)

    Article  Google Scholar 

  36. Hu, Z., Turki, T., Phan, N., et al.: A 3D atrous convolutional long short-term memory network for background subtraction. IEEE Access 6, 43450–43459 (2018)

    Article  Google Scholar 

  37. Shahbaz, A., Jo, K.H.: Deep atrous spatial features-based supervised foreground detection algorithm for industrial surveillance systems. IEEE Trans. Industr. Inf. 17(7), 4818–4826 (2020)

    Article  Google Scholar 

  38. Kim, J.Y., Ha, J.E.: Generation of background model image using foreground model. IEEE Access 9, 127515–127530 (2021)

    Article  Google Scholar 

  39. Cuevas, C., García, N.: Improved background modeling for real-time spatio-temporal non-parametric moving object detection strategies. Image Vis. Comput. 31(9), 616–630 (2013)

    Article  Google Scholar 

  40. Maddalena, L., Petrosino, A.: The SOBS algorithm: What are the limits? In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, pp. 21–26 (2012)

  41. Haines, T.S.F., Xiang, T.: Background subtraction with dirichletprocess mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 670–683 (2013)

    Article  Google Scholar 

  42. Berjón, D., Cuevas, C., Morán, F., et al.: Real-time nonparametric background subtraction with tracking-based foreground update. Pattern Recogn. 74, 156–170 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the CDnet-2014 and LASIESTA datasets, which allowed us to train and evaluate the proposed model.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 61674049 and U19A2053, the Fundamental Research Funds for the Central Universities of China under Grant JZ2021HGQA0262.

Author information

Authors and Affiliations

Authors

Contributions

YY supervised the project; DL and XL mainly conducted experiments, and collected and analyzed the data; ZZ and GX provided guidance in the algorithms and experiments; YY, DL and XL wrote the main manuscript; All authors discussed the results, commented on and revised the manuscript.

Corresponding authors

Correspondence to Yizhong Yang or Guangjun Xie.

Ethics declarations

Conflict of interest

The authors declare that they have no Competing interests.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Li, D., Li, X. et al. A multi-scale inputs and labels model for background subtraction. SIViP 17, 4133–4141 (2023). https://doi.org/10.1007/s11760-023-02645-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02645-5

Keywords

Navigation