Skip to main content
Log in

Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects

  • Review
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

Moving object segmentation (MOS), aiming at segmenting moving objects from video frames, is an important and challenging task in computer vision and with various applications. With the development of deep learning (DL), MOS has also entered the era of deep models toward spatiotemporal feature learning. This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years. Specifically, we present a more up-to-date categorization based on model characteristics, then compare and discuss each category from feature learning (FL), and model training and evaluation perspectives. For FL, the methods reviewed are divided into three types: spatial FL, temporal FL, and spatiotemporal FL, then analyzed from input and model architectures aspects, three input types, and four typical preprocessing subnetworks are summarized. In terms of training, we discuss ideas for enhancing model transferability. In terms of evaluation, based on a previous categorization of scene dependent evaluation and scene independent evaluation, and combined with whether used videos are recorded with static or moving cameras, we further provide four subdivided evaluation setups and analyze that of reviewed methods. We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology. Finally, based on the above comparisons and discussions, we present research prospects and future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. B. Garcia-Garcia, T. Bouwmans, A. J. R. Silva. Background subtraction in real applications: Challenges, current models and future directions. Computer Science Review, vol. 35, Article number 100204, 2020. DOI: https://doi.org/10.1016/j.cosrev.2019.100204.

  2. Y. Wang, P. M. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, P. Ishwar. CDnet 2014: An expanded change detection benchmark dataset. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, USA, pp. 393–400, 2014. DOI: https://doi.org/10.1109/CVPRW.2014.126.

  3. M. Mandal, S. K. Vipparthi. An empirical review of deep learning frameworks for change detection: Model design, experimental frameworks, challenges and research needs. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6101–6122, 2022. DOI: https://doi.org/10.1109/TITS.2021.3077883.

    Google Scholar 

  4. T. Bouwmans, S. Javed, M. Sultana, S. K. Jung. Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. Neural Networks, vol. 117, pp. 8–66, 2019. DOI: https://doi.org/10.1016/j.neunet.2019.04.024

    Google Scholar 

  5. Y. M. Latha, B. S. Rao. A systematic review on background subtraction model for data detection. In Proceedings of International Conference Pervasive Computing and Social Networking, Springer, Salem, India, pp. 341–349, 2022. DOI: https://doi.org/10.1007/978-981-16-5640-8_27.

    Google Scholar 

  6. R. Kalsotra, S. Arora. Background subtraction for moving object detection: Explorations of recent developments and challenges. The Visual Computer, to be published.

  7. O. Barnich, M. Van Droogenbroeck. ViBe: A universal background subtraction algorithm for video sequences. IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709–1724, 2011. DOI: https://doi.org/10.1109/TIP.2010.2101613.

    MathSciNet  MATH  Google Scholar 

  8. H. Sajid, S. C. S. Cheung. Universal multimode background subtraction. IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3249–3260, 2017. DOI: https://doi.org/10.1109/TIP.2017.2695882.

    MathSciNet  MATH  Google Scholar 

  9. C. Stauffer, W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, USA, pp. 246–252, 1999. DOI: https://doi.org/10.1109/CVPR.1999.784637.

  10. M. Hofmann, P. Tiefenbacher, G. Rigoll. Background segmentation with feedback: The pixel-based adaptive segmenter. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, pp. 38–43, 2012. DOI: https://doi.org/10.1109/CVPRW.2012.6238925.

  11. M. L. Chen, Q. X. Yang, Q. Li, G. Wang, M. H. Yang. Spatiotemporal background subtraction using minimum spanning tree and optical flow. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp.521–534, 2014. DOI: https://doi.org/10.1007/978-3-319-10584-0_34.

    Google Scholar 

  12. P. L. St-Charles, G. A. Bilodeau, R. Bergevin. SuB-SENSE: A universal change detection method with local adaptive sensitivity. IEEE Transactions on Image Processing, vol. 24, no. 1, pp. 359–373, 2015. DOI: https://doi.org/10.1109/TIP.2014.2378053.

    MathSciNet  MATH  Google Scholar 

  13. C. R. Wren, A. Azarbayejani, T. Darrell, A. P. Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780–785, 1997. DOI: https://doi.org/10.1109/34.598236.

    Google Scholar 

  14. Y. Y. Chen, J. Q. Wang, H. Q. Lu. Learning sharable models for robust background subtraction. In Proceedings of IEEE International Conference on Multimedia and Expo, Turin, Italy, 2015. DOI: https://doi.org/10.1109/ICME.2015.7177419.

  15. S. C. Liao, G. Y. Zhao, V. Kellokumpu, M. Pietikäinen, S. Z. Li. Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, pp. 1301–1306, 2010. https://doi.org/10.1109/CVPR.2010.5539817.

  16. Y. Goyat, T. Chateau, L. Malaterre, L. Trassoudaine. Vehicle trajectories evaluation by static video sensors. In Proceedings of IEEE Intelligent Transportation Systems Conference, Toronto, Canada, pp.864–869, 2006. DOI: https://doi.org/10.1109/ITSC.2006.1706852.

  17. P. L. St-Charles, G. A. Bilodeau, R. Bergevin. A self-adjusting approach to change detection based on background word consensus. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, pp. 990–997, 2015. DOI: https://doi.org/10.1109/WACV.2015.137.

  18. S. Q. Jiang, X. B. Lu. WeSamBE: A weight-sample-based method for background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 9, pp. 2105–2115, 2018. DOI: https://doi.org/10.1109/TCSVT.2017.2711659.

    Google Scholar 

  19. S. M. Roy, A. Ghosh. Foreground segmentation using adaptive 3 phase background model. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 6, pp. 2287–2296, 2020. DOI: https://doi.org/10.1109/TITS.2019.2915568.

    Google Scholar 

  20. L. Maddalena, A. Petrosino. A self-organizing approach to background subtraction for visual surveillance applications. IEEE Transactions on Image Processing, vol. 17, no. 7, pp. 1168–1177, 2008. DOI: https://doi.org/10.1109/TIP.2008.924285.

    MathSciNet  Google Scholar 

  21. D. Culibrk, O. Marques, D. Socek, H. Kalva, B. Furht. Neural network approach to background modeling for video object segmentation. IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1614–1627, 2007. DOI: https://doi.org/10.1109/TNN.2007.896861.

    Google Scholar 

  22. L. Maddalena, A. Petrosino. Extracting a background image by a multi-modal scene background model. In Proceedings of the 23rd International Conference on Pattern Recognition, IEEE, Cancun, Mexico, pp. 143–148, 2016. DOI: https://doi.org/10.1109/ICPR.2016.7899623.

    Google Scholar 

  23. M. Yu, Y. Z. Yu, A. Rhuma, S. M. R. Naqvi, L. Wang, J. A. Chambers. An online one class support vector machine-based person-specific fall detection system for monitoring an elderly individual in a room environment. IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 6, pp. 1002–1014, 2013. DOI: https://doi.org/10.1109/JBHI.2013.2274479.

    Google Scholar 

  24. Z. Xu, B. Min, R. C. C. Cheung. A robust background initialization algorithm with superpixel motion detection. Signal Processing Image Communication, vol. 71, pp. 1–12, 2019. DOI: https://doi.org/10.1016/j.image.2018.07.004.

    Google Scholar 

  25. N. M. Oliver, B. Rosario, A. P. Pentland. A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831–843, 2000. DOI: https://doi.org/10.1109/34.868684.

    Google Scholar 

  26. E. J. Candès, X. Li, Y. Ma, J. Wright. Robust principal component analysis? Journal of the ACM, vol. 58, no. 3, Article number 11, 2011. DOI: https://doi.org/10.1145/1970392.1970395.

    Google Scholar 

  27. J. Yao, J. M. Odobez. Multi-layer background subtraction based on color and texture. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007. DOI: https://doi.org/10.1109/CVPR.2007.383497.

  28. A. B. Godbehere, A. Matsukawa, K. Goldberg. Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In Proceedings of the American Control Conference, IEEE, Montreal, Canada, pp. 4305–4312, 2012. DOI: https://doi.org/10.1109/ACC.2012.6315174.

    Google Scholar 

  29. B. Laugraud, M. Van Droogenbroeck. Is a memoryless motion detection truly relevant for background generation with LaBGen? In Proceedings of the 18th International Conference on Advanced Concepts for Intelligent Vision Systems, Springer, Antwerp, Belgium, pp. 443–454, 2017. DOI: https://doi.org/10.1007/978-3-319-70353-4_38.

    Google Scholar 

  30. S. H. Lee, G. C. Lee, J. Yoo, S. Kwon. WisenetMD: Motion detection using dynamic background region analysis. Symmetry, vol. 11, no. 5, Article number 621, 2019. DOI: https://doi.org/10.3390/sym11050621.

    Google Scholar 

  31. S. Bianco, G. Ciocca, R. Schettini. Combination of video change detection algorithms by genetic programming. IEEE Transactions on Evolutionary Computation, vol. 21, no. 6, pp. 914–928, 2017. DOI: https://doi.org/10.1109/TEVC.2017.2694160.

    Google Scholar 

  32. F. El Baf, T. Bouwmans, B. Vachon. Fuzzy integral for moving object detection. In Proceedings of IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence), Hong Kong, China, pp. 1729–1736, 2008. DOI: https://doi.org/10.1109/FUZZY.2008.4630604.

  33. H. X. Zhang, D. Xu. Fusing color and texture features for background model. In Proceedings of the 3rd Fuzzy Systems and Knowledge Discovery, Springer, Xi’an, China, pp. 887–893, 2006. DOI: https://doi.org/10.1007/11881599_110.

    Google Scholar 

  34. B. Xu, N. Y. Wang, T. Q. Chen, M. Li. Empirical evaluation of rectified activations in convolutional network. [Online], Available: http://arxiv.org/abs/1505.00853, 2015.

  35. D. Misra. Mish: A self regularized non-monotonic activation function. In Proceedings of the 31st British Machine Vision Conference, Manchester UK, 2020.

  36. B. Ding, H. M. Qian, J. Zhou. Activation functions and their characteristics in deep neural networks. In Proceedings of Chinese Control and Decision Conference, IEEE, Shenyang, China, pp. 1836–1841, 2018. DOI: https://doi.org/10.1109/CCDC.2018.8407425.

    Google Scholar 

  37. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, USA, 2015. DOI: https://doi.org/10.48550/arXiv.1412.6980.

  38. R. Y. Sun. Optimization for deep learning: An overview. Journal of the Operations Research Society of China, vol. 8, no. 2, pp. 249–294, 2020. DOI: https://doi.org/10.1007/s40305-020-00309-6.

    MathSciNet  MATH  Google Scholar 

  39. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

    MathSciNet  MATH  Google Scholar 

  40. S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 448–456, 2015.

  41. S. Ioffe. Batch renormalization: Towards reducing mini-batch dependence in batch-normalized models. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 1942–1950, 2017.

  42. J. Kukačka, V. Golkov, D. Cremers. Regularization for deep learning: A taxonomy. [Online], Available: https://arxiv.org/abs/1710.10686, 2017.

  43. J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00745.

    Google Scholar 

  44. S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, German, pp. 3–19, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_1.

    Google Scholar 

  45. Z. Y. Niu, G. Q. Zhong, H. Yu. A review on the attention mechanism of deep learning. Neurocomputing, vol. 452, pp. 48–62, 2021. DOI: https://doi.org/10.1016/j.neucom.2021.03.091.

    Google Scholar 

  46. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference for Learning Representations, 2021.

  47. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248–255, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848.

  48. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerlan, pp. 740–755, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_48.

    Google Scholar 

  49. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Q. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Q. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. [Online], Available: https://arxiv.org/abs/1603.04467, 2016.

  50. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. M. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. J. Bai, S. Chintala. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 721, 2019.

  51. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, vol. 11, no. 12, pp. 3371–3408, 2010.

    MathSciNet  MATH  Google Scholar 

  52. D. P. Kingma, M. Welling. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, 2014. DOI: https://doi.org/10.48550/arXiv.1312.6114.

  53. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2672–2680, 2014.

  54. K. F. Wang, C. Gou, Y. J. Duan, Y. L. Lin, X. H. Zheng, F. Y. Wang. Generative adversarial networks: Introduction and outlook. IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 4, pp. 588–598, 2017. DOI: https://doi.org/10.1109/JAS.2017.7510583.

    MathSciNet  Google Scholar 

  55. J. Gui, Z. N. Sun, Y. G. Wen, D. C. Tao, J. P. Ye. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Transactions on Knowledge and Data Engineering, to be published.

  56. K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representation, San Diego, USA, 2015. DOI: https://doi.org/10.48550/arXiv.1409.1556.

  57. C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.

  58. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.

    Google Scholar 

  59. J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 3431–3440, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965.

  60. C. Z. Wu, J. Sun, J. Wang, L. F. Xu, S. Zhan. Encoding-decoding network with pyramid self-attention module for retinal vessel segmentation. International Journal of Automation and Computing, vol. 18, no. 6, pp. 973–980, 2021. DOI: https://doi.org/10.1007/s11633-020-1277-0.

    Google Scholar 

  61. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2018. DOI: https://doi.org/10.1109/TPAMI.2017.2699184.

    Google Scholar 

  62. F. Yu, V. Koltun. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016. DOI: https://doi.org/10.48550/arxiv.org/abs/1511.07122.

  63. H. Noh, S. Hong, B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1520–1528, 2015. DOI: https://doi.org/10.1109/ICCV.2015.178.

  64. V. Badrinarayanan, A. Kendall, R. Cipolla. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2644615.

    Google Scholar 

  65. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.

  66. L. C. Chen, G. Papandreou, F. Schroff, H. Adam. Re-thinking atrous convolution for semantic image segmentation. [Online], Available: https://arxiv.org/abs/1706.05587, 2017.

  67. L. C. Chen, Y. K. Zhu, G. Papandreou, F. Schroff, H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 833–851, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_49.

    Google Scholar 

  68. O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, Munich, Germany, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.

    Google Scholar 

  69. I. B. Senkyire, Z. Liu. Supervised and semi-supervised methods for abdominal organ segmentation: A review. International Journal of Automation and Computing, vol. 18, no. 6, pp. 887–914, 2021. DOI: https://doi.org/10.1007/s11633-021-1313-0.

    Google Scholar 

  70. K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.322.

  71. S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.

    Google Scholar 

  72. Z. C. Lipton, J. Berkowitz, C. Elkan. A critical review of recurrent neural networks for sequence learning. [Online], Available: https://arxiv.org/abs/1506.00019, 2015.

  73. S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.

    Google Scholar 

  74. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.

  75. J. Zhou, G. Q. Cui, S. D. Hu, Z. Y. Zhang, C. Yang, Z. Y. Liu, L. F. Wang, C. C. Li, M. S. Sun. Graph neural networks: A review of methods and applications. AI Open, vol. 1, pp. 57–81, 2020. DOI: https://doi.org/10.1016/j.aiopen.2021.01.001.

    Google Scholar 

  76. J. Gracewell, M. John. Dynamic background modeling using deep learning autoencoder network. Multimedia Tools and Applications, vol. 79, no. 7, pp. 4639–4659, 2020. DOI: https://doi.org/10.1007/s11042-019-7411-0.

    Google Scholar 

  77. P. Xu, M. Ye, Q. H. Liu, X. D. Li, L. S. Pei, J. Ding. Motion detection via a couple of auto-encoder networks. In Proceedings of IEEE International Conference on Multimedia and Expo, Chengdu, China, 2014. DOI: https://doi.org/10.1109/ICME.2014.6890140.

  78. P. Xu, M. Ye, X. Li, Q. H. Liu, Y. Yang, J. Ding. Dynamic background learning through deep auto-encoder networks. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, USA, pp. 107–116, 2014. DOI: https://doi.org/10.1145/2647868.2654914.

  79. B. Rezaei, F. Amirreza, S. Ostadabbas. DeepPBM: Deep probabilistic background model estimation from video sequences. In Proceedings of the International Conference on Pattern Recognition, Springer, pp.608–621, 2021. DOI: https://doi.org/10.1007/978-3-030-68790-8_47.

  80. A. Vacavant, T. Chateau, A. Wilhelm, L. Lequièvre. A benchmark dataset for outdoor foreground/background extraction. In Proceedings of the Asian Conference on Computer Vision, Springer, Daejeon, Republic of Korea, pp. 291–300, 2013. DOI: https://doi.org/10.1007/978-3-642-37410-4_25.

    Google Scholar 

  81. B. Rezaei, A. Farnoosh, S. Ostadabbas. G-LBM: Generative low-dimensional background model estimation from video sequences. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK., pp. 293–310, 2020. DOI: https://doi.org/10.1007/978-3-030-58610-2_18.

    Google Scholar 

  82. P. M. Jodoin, L. Maddalena, A. Petrosino, Y. Wang.z Extensive benchmark and survey of modeling methods for scene background initialization. IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5244–5256, 2017. DOI: https://doi.org/10.1109/TIP.2017.2728181.

    MathSciNet  MATH  Google Scholar 

  83. S. Javed, A. Mahmood, T. Bouwmans, S. K. Jung. Background-foreground modeling based on spatiotemporal sparse subspace clustering. IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5840–5854, 2017. DOI: https://doi.org/10.1109/TIP.2017.2746268.

    MathSciNet  Google Scholar 

  84. I. Halfaoui, F. Bouzaraa, O. Urfalioglu. CNN-based initial background estimation. In Proceedings of the 23rd International Conference on Pattern Recognition, IEEE, Cancun, Mexico, pp. 101–106, 2016. DOI: https://doi.org/10.1109/ICPR.2016.7899616.

    Google Scholar 

  85. I. Ul Haq, T. Iwata, Y. Kawahara. Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos. Computer Vision and Image Understanding, vol. 216, Article number 103355, 2022. DOI: https://doi.org/10.1016/j.cviu.2021.103355.

  86. K. Toyama, J. Krumm, B. Brumitt, B. Meyers. Wallflower: Principles and practice of background maintenance. In Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, pp. 255–261, 1999. DOI: https://doi.org/10.1109/ICCV.1999.791228.

  87. P. J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics, vol. 656, pp. 5–28, 2010. DOI: https://doi.org/10.1017/S0022112010001217.

    MathSciNet  MATH  Google Scholar 

  88. A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbas, V. Golkov, P. van der Smagt, D. Cremers, T. Brox. FlowNet: Learning optical flow with convolutional networks. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 2758–2766, 2015. DOI: https://doi.org/10.1109/ICCV.2015.316.

  89. M. Sultana, A. Mahmood, S. Javed, S. K. Jung. Unsupervised deep context prediction for background estimation and foreground segmentation. Machine Vision and Applications, vol. 30, no. 3, pp. 375–395, 2019. DOI: https://doi.org/10.1007/s00138-018-0993-0.

    Google Scholar 

  90. M. Sultana, A. Mahmood, S. K. Jung. Unsupervised moving object detection in complex scenes using adversarial regularizations. IEEE Transactions on Multimedia, vol. 23, pp. 2005–2018, 2021. DOI: https://doi.org/10.1109/TMM.2020.3006419.

    Google Scholar 

  91. L. Li, W. Huang, I. Y. H. Gu, Q. Tian. Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing, vol. 13, no. 11, pp. 1459–1472, 2004. DOI: https://doi.org/10.1109/TIP.2004.836169.

    Google Scholar 

  92. Z. F. Zhu, Y. Y. Meng, D. Q. Kong, X. X. Zhang, Y. D. Guo, Y. Zhao. To see in the dark: N2DGAN for background modeling in nighttime scene. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 2, pp. 492–502, 2021. DOI: https://doi.org/10.1109/TCSVT.2020.2987874.

    Google Scholar 

  93. M. Braham, M. Van Droogenbroeck. Deep background subtraction with scene-specific convolutional neural networks. In Proceedings of the International Conference on Systems, Signals and Image Processing, IEEE, Bratislava, Slovakia, 2016. DOI: https://doi.org/10.1109/IWSSIP.2016.7502717.

    Google Scholar 

  94. Y. Wang, Z. M. Luo, P. M. Jodoin. Interactive deep learning method for segmenting moving objects. Pattern Recognition Letters, vol. 96, pp. 66–75, 2017. DOI: https://doi.org/10.1016/j.patrec.2016.09.014.

    Google Scholar 

  95. M. Babaee, D. T. Dinh, G. Rigoll. A deep convolutional neural network for video sequence background subtraction. Pattern Recognition, vol. 76, pp.635–649, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.09.040.

    Google Scholar 

  96. L. A. Lim, H. Y. Keles. Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recognition Letters, vol. 112, pp. 256–262, 2018. DOI: https://doi.org/10.1016/j.patrec.2018.08.002.

    Google Scholar 

  97. L. A. Lim, H. Y. Keles. Learning multi-scale features for foreground segmentation. Pattern Analysis and Applications, vol. 23, no. 3, pp. 1369–1380, 2020. DOI: https://doi.org/10.1007/s10044-019-00845-9.

    Google Scholar 

  98. M. O. Tezcan, P. Ishwar, J. Konrad. BSUV-Net: A fully-convolutional neural network for background subtraction of unseen videos. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, pp. 2774–2783, 2020. DOI: https://doi.org/10.1109/WACV45572.2020.9093464.

  99. D. D. Zeng, M. Zhu, A. Kuijper. Combining background subtraction algorithms with convolutional neural network. Journal of Electronic Imaging, vol. 28, no. 1, Article number 013011, 2019. DOI: https://doi.org/10.1117/1.JEI.28.1.013011.

    Google Scholar 

  100. R. Wang, F. Bunyak, G. Seetharaman, K. Palaniappan. Static and moving object detection using flux tensor with split Gaussian models. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, Columbus, USA, pp. 414–418, 2014. DOI: https://doi.org/10.1109/CVPRW.2014.68.

  101. M. De Gregorio, M. Giordano. CWISARDH.+: Background detection in RGBD videos by learning of weightless neural networks. In Proceedings of International Conference on Image Analysis and Processing, Springer, Catania, Italy, pp. 242–253, 2017. DOI: https://doi.org/10.1007/978-3-319-70742-6_23.

    Google Scholar 

  102. G. Rahmon, F. Bunyak, G. Seetharaman, K. Palaniappan. Motion U-Net: Multi-cue encoder-decoder network for motion segmentation. In Proceedings of the 25th International Conference on Pattern Recognition, IEEE, Milan, Italy, pp. 8125–8132, 2020. DOI: https://doi.org/10.1109/ICPR48806.2021.9413211.

    Google Scholar 

  103. F. Bunyak, K. Palaniappan, S. K. Nath, G. Seetharaman. Flux tensor constrained geodesic active contours with sensor fusion for persistent object tracking. Journal of Multimedia, vol. 2, no. 4, pp. 20–33, 2007. DOI: https://doi.org/10.4304/jmm.2.4.20-33.

    Google Scholar 

  104. Z. Zivkovic, F. van der Heijden. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, vol. 27, no. 7, pp. 773–780, 2006. DOI: https://doi.org/10.1016/j.patrec.2005.11.005.

    Google Scholar 

  105. Z. Zivkovic. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, IEEE, Cambridge, UK, pp. 28–31, 2004. DOI: https://doi.org/10.1109/ICPR.2004.1333992.

    Google Scholar 

  106. L. Maddalena, A. Petrosino. Towards benchmarking scene background initialization. In Proceedings of International Conference on Image Analysis and Processing, Springer, Genoa, Italy, pp.469–476, 2015. DOI: https://doi.org/10.1007/978-3-319-23222-5_57.

    Google Scholar 

  107. W. B. Zheng, K. F. Wang, F. Y. Wang. A novel background subtraction algorithm based on parallel vision and Bayesian GANs. Neurocomputing, vol. 394, pp. 178–200, 2020. DOI: https://doi.org/10.1016/j.neucom.2019.04.088.

    Google Scholar 

  108. M. Braham, S. Piérard, M. Van Droogenbroeck. Semantic background subtraction. In Proceedings of IEEE International Conference on Image Processing, Beijing, China, pp. 4552–4556, 2017. DOI: https://doi.org/10.1109/ICIP.2017.8297144.

  109. S. Isik, K. Özkan, S. Günal, Ö. N. Gerek. SWCD: A sliding window and self-regulated learning-based background updating method for change detection in videos. Journal of Electronic Imaging, vol. 27, no. 2, Article number 23002, 2018. DOI: https://doi.org/10.1117/1.JEI.27.2.023002.

    Google Scholar 

  110. T. Minematsu, A. Shimada, R. I. Taniguchi. Simple background subtraction constraint for weakly supervised background subtraction network. In Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, 2019. DOI: https://doi.org/10.1109/AVSS.2019.8909896.

  111. M. Vijayan, P. Raguraman, R. Mohan. A fully residual convolutional neural network for background subtraction. Pattern Recognition Letters, vol. 146, pp.63–69, 2021. DOI: https://doi.org/10.1016/j.patrec.2021.02.017.

    Google Scholar 

  112. Y. Z. Yang, J. H. Ruan, Y. Q. Zhang, X. Cheng, Z. Zhang, G. J. Xie. STPNet: A spatial-temporal propagation network for background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2145–2157, 2022. DOI: https://doi.org/10.1109/TCSVT.2021.3088130.

    Google Scholar 

  113. C. Cuevas, E. M. Yáñez, N. García. Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA. Computer Vision and Image Understanding, vol. 152, pp. 103–117, 2016. DOI: https://doi.org/10.1016/j.cviu.2016.08.005.

    Google Scholar 

  114. T. Akilan, Q. J. Wu, A. Safaei, J. Huo, Y. M. Yang. A 3D CNN-LSTM-based image-to-image foreground segmentation. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 3, pp. 959–971, 2020. DOI: https://doi.org/10.1109/TITS.2019.2900426.

    Google Scholar 

  115. Y. Wang, Z. J. Yu, L. Q. Zhu. Foreground detection with deeply learned multi-scale spatial-temporal features. Sensors, vol. 18, no. 12, Article number 4269, 2018. DOI: https://doi.org/10.3390/s18124269.

    Google Scholar 

  116. Y. Y. Chen, J. Q. Wang, B. K. Zhu, M. Tang, H. Q. Lu. Pixelwise deep sequence learning for moving object detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 9, pp. 2567–2579, 2019. DOI: https://doi.org/10.1109/TCSVT.2017.2770319.

    Google Scholar 

  117. D. D. Zeng, X. Chen, M. Zhu, M. Goesele, A. Kuijper. Background subtraction with real-time semantic segmentation. IEEE Access, vol. 7, pp. 153869–153884, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2899348.

    Google Scholar 

  118. P. W. Patil, S. Murala. MSFgNet: A novel compact end-to-end deep network for moving object detection. IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 11, pp.4066–4077, 2019. DOI: https://doi.org/10.1109/TITS.2018.2880096.

    Google Scholar 

  119. V. M. Mondéjar-Guerra, J. Rouco, J. Novo, M. Ortega. An end-to-end deep learning approach for simultaneous background modeling and subtraction. In Proceedings of the 30th British Machine Vision Conference, Cardiff, UK, pp. 266–277, 2019.

  120. W. J. Kim, S. Hwang, J. Lee, S. Woo, S. Lee. AIBM: Accurate and instant background modeling for moving object detection. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 9021–9036, 2022. DOI: https://doi.org/10.1109/TITS.2021.3090092.

    Google Scholar 

  121. D. Liang, Z. Q. Wei, H. Sun, H. Y. Zhou. Robust cross-scene foreground segmentation in surveillance video. In Proceedings of IEEE International Conference on Multimedia and Expo, Shenzhen, China, 2021. DOI: https://doi.org/10.1109/ICME51207.2021.9428086.

  122. University Kyushu. LIMU, 2008, [Online], Available: https://limu.ait.kyushu-u.ac.jp/dataset/en/, 2022.

  123. J. Zhang, Y. Li, F. Q. Chen, Z. S. Pan, X. Y. Zhou, Y. D. Li, S. S. Jiao. X-Net: A binocular summation network for foreground segmentation. IEEE Access, vol. 7, pp. 71412–71422, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2919802.

    Google Scholar 

  124. J. Zhang, S. H. Wang, J. Y. Qiu, X. R. Pan, J. H. Zou, Y. X. Duan, Z. S. Pan, Y. Li. A fast X-shaped foreground segmentation network with CompactASPP. Engineering Applications of Artificial Intelligence, vol. 97, Article number 104077, 2021. DOI: https://doi.org/10.1016/j.engappai.2020.104077.

  125. J. Zhang, X. Zhang, Y. Y. Zhang, Y. X. Duan, Y. Li, Z. S. Pan. Meta-knowledge learning and domain adaptation for unseen background subtraction. IEEE Transactions on Image Processing, vol. 30, pp. 9058–9068, 2021. DOI: https://doi.org/10.1109/TIP.2021.3122102.

    Google Scholar 

  126. M. Mandal, V. Dhar, A. Mishra, S. K. Vipparthi. 3DFR: A swift 3D feature reductionist framework for scene independent change detection. IEEE Signal Processing Letters, vol. 26, no. 12, pp. 1882–1886, 2019. DOI: https://doi.org/10.1109/LSP.2019.2952253.

    Google Scholar 

  127. Z. J. Zou, Z. T. Meng, L. Shu, J. Hao. A change-aware approach for relative motion segmentation. In Proceedings of IEEE International Conference on Multimedia and Expo, Shenzhen, China, 2021. DOI: https://doi.org/10.1109/ICME51207.2021.9428082.

  128. W. B. Zheng, K. F. Wang, F. Y. Wang. Background subtraction algorithm with Bayesian generative adversarial networks. Acta Automatica Sinica, vol. 44, no. 5, pp. 878–890, 2018. DOI: https://doi.org/10.16383/j.aas.2018.cl70562. (in Chinese)

    Google Scholar 

  129. C. Q. Zhao, A. Basu. Dynamic deep pixel distribution learning for background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 11, pp.4192–4206, 2020. DOI: https://doi.org/10.1109/TCSVT.2019.2951778.

    Google Scholar 

  130. Z. H. Hu, T. Turki, N. Phan, J. T. L. Wang. A 3D atrous convolutional long short-term memory network for background subtraction. IEEE Access, vol. 6, pp. 43450–43459, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2861223.

    Google Scholar 

  131. M. Mandal, S. K. Vipparthi. Scene independency matters: An empirical study of scene dependent and scene independent evaluation for CNN-based change detection. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 3, pp. 2031–2044, 2022. DOI: https://doi.org/10.1109/TITS.2020.3030801.

    Google Scholar 

  132. M. Mandal, V. Dhar, A. Mishra, S. K. Vipparthi, M. Abdel-Mottaleb. 3DCD: Scene independent end-to-end spatiotemporal feature learning framework for change detection in unseen videos. IEEE Transactions on Image Processing, vol. 30, pp. 546–558, 2021. DOI: https://doi.org/10.1109/TIP.2020.3037472.

    Google Scholar 

  133. B. X. Hou, Y. Liu, N. M. Ling, L. Z. Liu, Y. X. Ren. A fast lightweight 3D separable convolutional neural network with multi-input multi-output for moving object detection. IEEE Access, vol. 9, pp. 148433–148448, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3123975.

    Google Scholar 

  134. F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, A. Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 724–732, 2016. DOI: https://doi.org/10.1109/CVPR.2016.85.

  135. S. Choo, W. Seo, D. J. Jeong, N. I. Cho. Multi-scale recurrent encoder-decoder network for dense temporal classification. In Proceedings of 24th International Conference on Pattern Recognition, IEEE, Beijing, China, pp. 103–108, 2018. DOI: https://doi.org/10.1109/ICPR.2018.8545597.

    Google Scholar 

  136. S. Choo, W. Seo, D. J. Jeong, N. I. Cho. Learning background subtraction by video synthesis and multi-scale recurrent networks. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 357–372, 2019. DOI: https://doi.org/10.1007/978-3-030-20876-9_23.

    Google Scholar 

  137. L. Yang, J. Li, Y. S. Luo, Y. Zhao, H. Cheng, J. Li. Deep background modeling using fully convolutional network. IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 254–262, 2018. DOI: https://doi.org/10.1109/TITS.2017.2754099.

    Google Scholar 

  138. P. W. Patil, A. Dudhane, S. Murala. Multi-frame recurrent adversarial network for moving object segmentation. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, pp. 2301–2310, 2021. DOI: https://doi.org/10.1109/WACV48630.2021.00235.

  139. P. Isola, J. Y. Zhu, T. H. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 1125–1134, 2017. DOI: https://doi.org/10.1109/CVPR.2017.632.

  140. G. M. Shi, T. Huang, W. S. Dong, J. J. Wu, X. M. Xie. Robust foreground estimation via structured Gaussian scale mixture modeling. IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 4810–4824, 2018. DOI: https://doi.org/10.1109/TIP.2018.2845123.

    MathSciNet  MATH  Google Scholar 

  141. S. Javed, A. Mahmood, S. Al-Maadeed, T. Bouwmans, S. K. Jung. Moving object detection in complex scene using spatiotemporal structured-sparse RPCA. IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 1007–1022, 2019. DOI: https://doi.org/10.1109/TIP.2018.2874289.

    MathSciNet  MATH  Google Scholar 

  142. T. Akilan, Q. M. J. Wu. sEnDec: An improved image to image CNN for foreground localization. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 10, pp. 4435–4443, 2020. DOI: https://doi.org/10.1109/TITS.2019.2940547.

    Google Scholar 

  143. T. Akilan, Q. M. J. Wu, W. D. Zhang. Video foreground extraction using multi-view receptive field and encoder-decoder DCNN for traffic and surveillance applications. IEEE Transactions on Vehicular Technology, vol. 68, no. 10, pp. 9478–9493, 2019. DOI: https://doi.org/10.1109/TVT.2019.2937076.

    Google Scholar 

  144. P. W. Patil, A. Dudhane, S. Chaudhary, S. Murala. Multi-frame based adversarial learning approach for video surveillance. Pattern Recognition, vol. 122, Article number 108350, 2022. DOI: https://doi.org/10.1016/j.patcog.2021.108350.

  145. C. L. Li, X. Wang, L. Zhang, J. Tang, H. J. Wu, L. Lin. Weighted low-rank decomposition for robust grayscale-thermal foreground detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 4, pp. 725–738, 2017. DOI: https://doi.org/10.1109/TCSVT.2016.2556586.

    Google Scholar 

  146. H. W. Yong, D. Y. Meng, W. M. Zuo, L. Zhang. Robust online matrix factorization for dynamic background subtraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 7, pp. 1726–1740, 2018. DOI: https://doi.org/10.1109/TPAMI.2017.2732350.

    Google Scholar 

  147. L. Chen, X. Jiang, X. Z. Liu, T. Kirubarajan, Z. X. Zhou. Outlier-robust moving object and background decomposition via structured p-regularized low-rank representation. IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 4, pp.620–638, 2021. DOI: https://doi.org/10.1109/TETCI.2019.2935747.

    Google Scholar 

  148. P. W. Patil, A. Dudhane, S. Murala. End-to-End recurrent generative adversarial network for traffic and surveillance applications. IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14550–14562, 2020. DOI: https://doi.org/10.1109/TVT.2020.3043575.

    Google Scholar 

  149. L. Maddalena, A. Petrosino. The SOBS algorithm: What are the limits? In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, pp. 21–26, 2012. DOI: https://doi.org/10.1109/CVPRW.2012.6238922.

  150. T. S. F. Haines, T. Xiang. Background subtraction with DirichletProcess mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 670–683, 2014. DOI: https://doi.org/10.1109/TPAMI.2013.239.

    Google Scholar 

  151. D. Berjón, C. Cuevas, F. Morán, N. García. Real-time nonparametric background subtraction with tracking-based foreground update. Pattern Recognition, vol. 74, pp. 156–170, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.09.009.

    Google Scholar 

  152. P. W. Patil, A. Dudhane, S. Murala, A. B. Gonde. Deep adversarial network for scene independent moving object segmentation. IEEE Signal Processing Letters, vol. 28, pp.489–493, 2021. DOI: https://doi.org/10.1109/LSP.2021.3059195.

    Google Scholar 

  153. P. W. Patil, K. M. Biradar, A. Dudhane, S. Murala. An end-to-end edge aggregation network for moving object segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8146–8155, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00817.

    Google Scholar 

  154. F. X. Li, T. Kim, A. Humayun, D. Tsai, J. M. Rehg. Video segmentation by tracking many figure-ground segments. In Proceedings of EEE International Conference on Computer Vision, Sydney, Australia, pp. 2192–2199, 2013. DOI: https://doi.org/10.1109/ICCV.2013.273.

  155. I. Osman, M. Abdelpakey, M. S. Shehata. TransBlast: Self-supervised learning using augmented subspace with Transformer for background/foreground separation. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 215–224, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00029.

  156. J. Zbontar, L. Jing, I. Misra, Y. LeCun, S. Deny. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of 38th International Conference on Machine Learning, pp. 12310–12320, 2021.

  157. J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, L. Van Gool. The 2017 DAVIS challenge on video object segmentation. [Online], Available: https://arxiv.org/abs/1704.00675, 2017.

  158. J. Zhang, Y. Li, C. L. Ren, L. Huang, S. H. Wang, Y. X. Duan, Z. S. Pan, J. Xie. Cross-scene foreground segmentation algorithm based on high-level feature differencing between frames. Acta Electronica Sinica, vol. 49, no. 10, pp. 2032–2040, 2021. DOI: https://doi.org/10.12263/DZXB.20200620. (in Chinese)

    Google Scholar 

  159. J. H. Giraldo, S. Javed, T. Bouwmans. Graph moving object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2485–2503, 2022. DOI: https://doi.org/10.1109/TPAMI.2020.3042093.

    Google Scholar 

  160. J. H. Giraldo, S. Javed, N. Werghi, T. Bouwmans. Graph CNN for moving object detection in complex environments from unseen videos. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 225–233, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00030.

    Google Scholar 

  161. V. Mahadevan, N. Vasconcelos. Spatiotemporal saliency in dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 171–177, 2010. DOI: https://doi.org/10.1109/TPAMI.2009.112.

    Google Scholar 

  162. X. W. Zhou, C. Yang, W. C. Yu. Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, 2013. DOI: https://doi.org/10.1109/TPAMI.2012.132.

    Google Scholar 

  163. J. He, L. Balzano, A. Szlam. Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, pp. 1568–1575, 2012. DOI: https://doi.org/10.1109/CVPR.2012.6247848.

  164. H. S. Zhao, X. J. Qi, X. Y. Shen, J. P. Shi, J. Y. Jia. ICNet for real-time semantic segmentation on high-resolution images. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 418–434, 2018. DOI: https://doi.org/10.1007/978-3-030-01219-9_25.

    Google Scholar 

  165. H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, J. Y. Jia. Pyramid scene parsing network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6230–6239, 2017. DOI: https://doi.org/10.1109/CV-PR.2017.660.

  166. A. Guzman-Pando, M. I. Chacon-Murguia. DeepFoveaNet: Deep fovea eagle-eye bioinspired model to detect moving objects. IEEE Transactions on Image Processing, vol. 30, pp. 7090–7100, 2021. DOI: https://doi.org/10.1109/TIP.2021.3101398.

    Google Scholar 

  167. Y. X. Ge, J. Y. Zhang, X. Y. Ren, C. Q. Zhao, J. Yang, A. Basu. Deep variation transformation network for foreground detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 9, pp. 3544–3558, 2021. DOI: https://doi.org/10.1109/TCSVT.2020.3042559.

    Google Scholar 

  168. W. J. Zhou, S. Kaneko, M. Hashimoto, Y. Satoh, D. Liang. Foreground detection based on co-occurrence background model with hypothesis on degradation modification in dynamic scenes. Signal Processing, vol. 160, pp. 66–79, 2019. DOI: https://doi.org/10.1016/j.sigpro.2019.02.021.

    Google Scholar 

  169. D. Liang, B. Kang, X. Y. Liu, P. Gao, X. Y. Tan, S. Kaneko. Cross-scene foreground segmentation with supervised and unsupervised model communication. Pattern Recognition, vol. 117, Article number 107995, 2021. DOI: https://doi.org/10.1016/j.patcog.2021.107995.

  170. D. Liang, J. X. Pan, H. Sun, H. Y. Zhou. Spatio-temporal attention model for foreground detection in cross-scene surveillance videos. Sensors, vol. 19, no. 23, Article number 5142, 2019. DOI: https://doi.org/10.3390/s19235142.

    Google Scholar 

  171. D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabally, C. Quek. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey. IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 8, pp. 1993–2016, 2017. DOI: 1109/TITS.2016.2634580.

    Google Scholar 

  172. R. Huang, M. Zhou, Y. Xing, Y. B. Zou, W. Fan. Change detection with various combinations of fluid pyramid integration networks. Neurocomputing, vol. 437, pp. 84–94, 2021. DOI: https://doi.org/10.1016/j.neucom.2021.01.030.

    Google Scholar 

  173. S. C. Li, P. C. Han, S. H. Bu, P. M. Tong, Q. Li, K. Li, G. Wan. Change detection in images using shape-aware siamese convolutional network. Engineering Applications of Artificial Intelligence, vol. 94, Article number 103819, 2020. DOI: https://doi.org/10.1016/j.engappai.2020.103819.

  174. T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, E. H. Zahzah. Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset. Computer Science Review, vol. 23, pp. 1–71, 2017. DOI: https://doi.org/10.1016/j.cosrev.2016.11.001.

    MATH  Google Scholar 

  175. V. Monga, Y. L. Li, Y. C. Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18–44, 2021. DOI: https://doi.org/10.1109/MSP.2020.3016905.

    Google Scholar 

  176. A. Sobral. BGSLibrary: An OpenCV C++ background subtraction library. In Proceedings of IX Workshop de Visão Computacional, Rio de Janeiro, Brazil, vol. 27, 2013. DOI: https://doi.org/10.13140/2.1.1740.7044.

  177. A. Sobral, T. Bouwmans, E. H. Zahzah. LRSLibrary: Low-rank and sparse tools for background modeling and subtraction in videos. Handbook of Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing, T. Bouwmans, N. S. Aybat, E. H. Zahzah, Eds., Boca Raton, USA: CRC Press, pp. 14-1–14-11, 2016.

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Nos. 61702323 and 62172268), the Shanghai Municipal Natural Science Foundation, China (No. 20ZR1423100), the Open Fund of Science and Technology on Thermal Energy and Power Laboratory (No. TPL2020C02), Wuhan 2nd Ship Design and Research Institute, Wuhan, China, the National Key Research and Development Program of China (No. 2018YFB1306303), and the Major Basic Research Projects of Natural Science Foundation of Shandong Province, China (No. ZR2019ZD07). The authors are grateful to the reviewers for their valuable comments that have improved the quality of this paper. Rui Jiang is grateful to Mrs. Xia-Hong Xu for her support to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hu Su.

Additional information

Rui Jiang received the B.Sc. degree in computational mathematics from Xi’an University of Technology (XUT), China in 2006, the M. Sc. degree in applied mathematics from the Xi’an Jiaotong University (XJTU), China in 2010, and the Ph. D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2016. She is now an assistant professor with College of Information Engineering, Shanghai Maritime University (SMU), China, and a visiting scholar with School of Computer Science and Technology, East China Normal University, China. She is a member of IEEE.

Her research interests include machine learning, pattern recognition and computer vision.

Ruixiang Zhu received the B.Eng. degree in software engineering from Shandong University of Technology (SDUT), China in 2020. He is currently a master student in computer technology with College of Information Engineering, Shanghai Maritime University (SMU), China.

His research interests include deep learning and computer vision.

Hu Su received the B.Sc. and M.Sc. degrees in information and computation science from Shandong University (SDU), China in 2007 and 2010, respectively, and the Ph. D. degree in control science and engineering from State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2013. Currently, he is an associate researcher with Research Center of Precision Sensing and Control, CASIA, China.

His research interests include intelligent control and optimization, and computer vision.

Yinlin Li received the B.Sc. degree in measurement and control technology and instrumentation from Xidian University, China in 2011, and the Ph.D. degree in pattern recognition and intelligent system from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2016. She is currently an associate professor with State Key Laboratory for Management and Control of Complex Systems, CASIA, China.

Her research interests include robotic vision, biologically inspired visual algorithms and robotic manipulation.

Yuan Xie received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2013. He is currently a full professor with School of Computer Science and Technology, East China Normal University, China. He has published around 80 papers in major international journals and conferences, including IJCV, TPAMI, TIP, CVPR, ECCV, IC-CV, ICML, NeurIPS, AAAI, IJCAI. He received the Hong Kong Scholar Award from the Society of Hong Kong Scholars and the China National Postdoctoral Council in 2014.

His research interests include image processing, computer vision, machine learning and pattern recognition.

Wei Zou received the B. Sc. degree in control theory and control engineering from Inner Mongolia University of Science and Technology, China in 1997, the M.Sc. degree in control theory and control engineering from Shandong University, China in 2000, and the Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2003. Since 2012, he has been a researcher with Research Center of Precision Sensing and Control, CASIA, China.

His research interests include visual control and intelligent robots.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, R., Zhu, R., Su, H. et al. Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects. Mach. Intell. Res. 20, 335–369 (2023). https://doi.org/10.1007/s11633-022-1378-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-022-1378-4

Keywords

Navigation