Abstract
Background subtraction is a challenging and fundamental task in computer vision, which aims at segmenting moving objects from the background. Recently, the attention mechanism has become a hot topic in the neural network. The algorithms based on encoder-decoder and multi-scale type network perform impressive results in the domain of background subtraction. In this paper, we propose a multi-scale inputs and labels (MSIL) model which is based on the encoder-decoder type network and the channel attention. The multi-scale fusion encoding (MSFE) module aims to utilize multi-scale inputs effectively, which can fuse the high-level and low-level features details. The channel attention (CA) module is introduced to connect the encoder and decoder to model channel-wise attentions. The multi-label supervision decoding (MLSD) module helps to learn richer hierarchical features and achieves better performance by the new multi-label supervision. The proposed model is also evaluated on the CDnet-2014 dataset and the LASIESTA dataset, which demonstrate the effectiveness and superiority of the proposed model by an average F-Measure of 0.9851 and 0.9633, respectively. In addition, scene independent evaluation experiments on the CDnet-2014 dataset demonstrate the effectiveness of the model on unseen videos.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this article.
References
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFS. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 669–677 (2016)
Sen-Ching, S.C., Kamath, C.: Robust techniques for background subtraction in urban traffic video. In: Visual Communications and Image Processing 2004. International Society for Optics and Photonics, vol. 5308, pp. 881-892 (2004)
Wu, H., Liu, N., Luo, X., et al.: Real-time background subtraction-based video surveillance of people by integrating local texture patterns. SIViP 8(4), 665–676 (2014)
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. Acm Comput. Surveys CSUR 38(4), 13-es (2006)
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2013)
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). IEEE, vol. 2 pp. 246–252 (1999)
Barnich, O., Van Droogenbroeck, M.: ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Proc. 20(6), 1709–1724 (2011). https://doi.org/10.1109/TIP.2010.2101613
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, (2014)
Zhang, H., Patel, V.M.: Density-aware single image de-raining using a multi-stream dense network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 695–704 (2018)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, pp. 354–370. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Zeng, D., Zhu, M.: Multiscale fully convolutional network for foreground object detection in infrared videos. IEEE Geosci. Remote Sens. Lett. 15(4), 617–621 (2018)
Wang, Y., Luo, Z., Jodoin, P.M.: Interactive deep learning method for segmenting moving objects. Pattern Recogn. Lett. 96, 66–75 (2017)
Tezcan O, Ishwar P, Konrad J. BSUV-Net: A fully-convolutional neural network for background subtraction of unseen videos[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 2774–2783.
Mandal, M., Dhar, V., Mishra, A., et al.: 3DCD: Scene independent end-to-end spatiotemporal feature learning framework for change detection in unseen videos. IEEE Trans. Image Process. 30, 546–558 (2020)
Sobral, A., Bouwmans, T.: BGS library: A library framework for algorithms evaluation in foreground/background segmentation. In: Background Modeling and Foreground Detection for Video Surveillance, Boca Raton, FL, USA: CRC Press, ch. 23, pp. 1–16 (2014)
Mandal, M., Vipparthi, S.K.: An empirical review of deep learning frameworks for change detection: model design, experimental frameworks, challenges and research needs. IEEE Trans. Intell. Transp. Syst. 1–22 (2021)
Wren, C.: Real-time tracking of the human body. Photonics East, SPIE, 2615 (1995)
Ahn, H., Kang, M.: Dynamic background subtraction with masked RPCA. SIViP 15(3), 467–474 (2021)
Braham, M., Van Droogenbroeck, M.: Deep background subtraction with scene-specific convolutional neural networks. In: 2016 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, pp. 1–4 (2016)
Nguyen, T.P., Pham, C.C., Ha, S.V.U., et al.: Change detection by training a triplet network for motion feature extraction. IEEE Trans. Circuits Syst. Video Technol. 29(2), 433–446 (2018)
Chen, Y., Wang, J., Zhu, B., et al.: Pixelwise deep sequence learning for moving object detection. IEEE Trans. Circuits Syst. Video Technol. 29(9), 2567–2579 (2017)
Patil, P.W., Murala, S.: Msfgnet: a novel compact end-to-end deep network for moving object detection. IEEE Trans. Intell. Transp. Syst. 20(11), 4066–4077 (2018)
Minematsu, T., Shimada, A., Taniguchi, R.: Rethinking background and foreground in deep neural network-based background subtraction. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3229–3233 (2020)
Giraldo, J.H., Bouwmans, T.: Semi-supervised background subtraction of unseen videos: minimization of the total variation of graph signals. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3224–3228 (2020)
Giraldo, J.H., Javed, S., Sultana, M., Jung, S.K., Bouwmans, T.: The emerging field of graph signal processing for moving object segmentation. In: Jeong, H., Sumi, K. (eds.) Frontiers of Computer Vision: 27th International Workshop, IW-FCV 2021, Daegu, South Korea, February 22–23, 2021, Revised Selected Papers, pp. 31–45. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-81638-4_3
Giraldo, J.H., Javed, S., Werghi, N., et al.: Graph CNN for moving object detection in complex environments from unseen videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 225–233 (2021)
Zhang, J., Zhang, X., Zhang, Y., et al.: Meta-knowledge learning and domain adaptation for unseen background subtraction. IEEE Trans. Image Process. 30, 9058–9068 (2021)
Hou, B., Liu, Y., Ling, N., et al.: A fast lightweight 3D separable convolutional neural network with multi-input multi-output for moving object detection. IEEE Access 9, 148433–148448 (2021)
Huini, F., Ma, Z., Zhao, B., Yang, Z., Jiang, Y., Zhu, M.: Lightweight convolutional neural network for foreground segmentation. In: Jia, Y., Zhang, W., Yongling, F., Zhiyuan, Y., Zheng, S. (eds.) Proceedings of 2021 Chinese Intelligent Systems Conference: Volume I, pp. 811–819. Springer Singapore, Singapore (2022). https://doi.org/10.1007/978-981-16-6328-4_81
Hou, B., Liu, Y., Ling, N.: A super-fast deep network for moving object detection. In: International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7132–7141 (2018)
Wang, Y., Jodoin, P.M., Porikli, F., et al.: CDnet 2014: An expanded change detection benchmark dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 387–394 (2014)
Cuevas, C., Yáñez, E.M., García, N.: Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA. Comput. Vis. Image Underst. 152, 103–117 (2016)
St-Charles, P.L., Bilodeau, G.A., Bergevin, R.: SuBSENSE: a universal change detection method with local adaptive sensitivity. IEEE Trans. Image Proc. 24, 359–373 (2015)
Bianco, S., Ciocca, G., Schettini, R.: Combination of video change detection algorithms by genetic programming. IEEE Trans. Evol. Comput. 21(6), 914–928 (2017)
Hu, Z., Turki, T., Phan, N., et al.: A 3D atrous convolutional long short-term memory network for background subtraction. IEEE Access 6, 43450–43459 (2018)
Shahbaz, A., Jo, K.H.: Deep atrous spatial features-based supervised foreground detection algorithm for industrial surveillance systems. IEEE Trans. Industr. Inf. 17(7), 4818–4826 (2020)
Kim, J.Y., Ha, J.E.: Generation of background model image using foreground model. IEEE Access 9, 127515–127530 (2021)
Cuevas, C., García, N.: Improved background modeling for real-time spatio-temporal non-parametric moving object detection strategies. Image Vis. Comput. 31(9), 616–630 (2013)
Maddalena, L., Petrosino, A.: The SOBS algorithm: What are the limits? In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, pp. 21–26 (2012)
Haines, T.S.F., Xiang, T.: Background subtraction with dirichletprocess mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 670–683 (2013)
Berjón, D., Cuevas, C., Morán, F., et al.: Real-time nonparametric background subtraction with tracking-based foreground update. Pattern Recogn. 74, 156–170 (2018)
Acknowledgements
The authors would like to thank the CDnet-2014 and LASIESTA datasets, which allowed us to train and evaluate the proposed model.
Funding
This work was supported by the National Natural Science Foundation of China under Grant 61674049 and U19A2053, the Fundamental Research Funds for the Central Universities of China under Grant JZ2021HGQA0262.
Author information
Authors and Affiliations
Contributions
YY supervised the project; DL and XL mainly conducted experiments, and collected and analyzed the data; ZZ and GX provided guidance in the algorithms and experiments; YY, DL and XL wrote the main manuscript; All authors discussed the results, commented on and revised the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no Competing interests.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y., Li, D., Li, X. et al. A multi-scale inputs and labels model for background subtraction. SIViP 17, 4133–4141 (2023). https://doi.org/10.1007/s11760-023-02645-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02645-5