Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects

Jiang, Rui; Zhu, Ruixiang; Su, Hu; Li, Yinlin; Xie, Yuan; Zou, Wei

doi:10.1007/s11633-022-1378-4

Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects

Review
Published: 20 April 2023

Volume 20, pages 335–369, (2023)
Cite this article

Machine Intelligence Research Aims and scope Submit manuscript

Rui Jiang ORCID: orcid.org/0000-0001-7195-7887^1,2,
Ruixiang Zhu¹,
Hu Su ORCID: orcid.org/0000-0002-0551-3193³,
Yinlin Li⁴,
Yuan Xie² &
…
Wei Zou³

256 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Moving object segmentation (MOS), aiming at segmenting moving objects from video frames, is an important and challenging task in computer vision and with various applications. With the development of deep learning (DL), MOS has also entered the era of deep models toward spatiotemporal feature learning. This paper aims to provide the latest review of recent DL-based MOS methods proposed during the past three years. Specifically, we present a more up-to-date categorization based on model characteristics, then compare and discuss each category from feature learning (FL), and model training and evaluation perspectives. For FL, the methods reviewed are divided into three types: spatial FL, temporal FL, and spatiotemporal FL, then analyzed from input and model architectures aspects, three input types, and four typical preprocessing subnetworks are summarized. In terms of training, we discuss ideas for enhancing model transferability. In terms of evaluation, based on a previous categorization of scene dependent evaluation and scene independent evaluation, and combined with whether used videos are recorded with static or moving cameras, we further provide four subdivided evaluation setups and analyze that of reviewed methods. We also show performance comparisons of some reviewed MOS methods and analyze the advantages and disadvantages of reviewed MOS methods in terms of technology. Finally, based on the above comparisons and discussions, we present research prospects and future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review of deep learning frameworks for moving object segmentation

Article 09 August 2023

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Multi-scale Deep Feature Transfer for Automatic Video Object Segmentation

Article 28 August 2023

References

B. Garcia-Garcia, T. Bouwmans, A. J. R. Silva. Background subtraction in real applications: Challenges, current models and future directions. Computer Science Review, vol. 35, Article number 100204, 2020. DOI: https://doi.org/10.1016/j.cosrev.2019.100204.
Y. Wang, P. M. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, P. Ishwar. CDnet 2014: An expanded change detection benchmark dataset. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, USA, pp. 393–400, 2014. DOI: https://doi.org/10.1109/CVPRW.2014.126.
M. Mandal, S. K. Vipparthi. An empirical review of deep learning frameworks for change detection: Model design, experimental frameworks, challenges and research needs. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6101–6122, 2022. DOI: https://doi.org/10.1109/TITS.2021.3077883.
Google Scholar
T. Bouwmans, S. Javed, M. Sultana, S. K. Jung. Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. Neural Networks, vol. 117, pp. 8–66, 2019. DOI: https://doi.org/10.1016/j.neunet.2019.04.024
Google Scholar
Y. M. Latha, B. S. Rao. A systematic review on background subtraction model for data detection. In Proceedings of International Conference Pervasive Computing and Social Networking, Springer, Salem, India, pp. 341–349, 2022. DOI: https://doi.org/10.1007/978-981-16-5640-8_27.
Google Scholar
R. Kalsotra, S. Arora. Background subtraction for moving object detection: Explorations of recent developments and challenges. The Visual Computer, to be published.
O. Barnich, M. Van Droogenbroeck. ViBe: A universal background subtraction algorithm for video sequences. IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709–1724, 2011. DOI: https://doi.org/10.1109/TIP.2010.2101613.
MathSciNet MATH Google Scholar
H. Sajid, S. C. S. Cheung. Universal multimode background subtraction. IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3249–3260, 2017. DOI: https://doi.org/10.1109/TIP.2017.2695882.
MathSciNet MATH Google Scholar
C. Stauffer, W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, USA, pp. 246–252, 1999. DOI: https://doi.org/10.1109/CVPR.1999.784637.
M. Hofmann, P. Tiefenbacher, G. Rigoll. Background segmentation with feedback: The pixel-based adaptive segmenter. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, pp. 38–43, 2012. DOI: https://doi.org/10.1109/CVPRW.2012.6238925.
M. L. Chen, Q. X. Yang, Q. Li, G. Wang, M. H. Yang. Spatiotemporal background subtraction using minimum spanning tree and optical flow. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp.521–534, 2014. DOI: https://doi.org/10.1007/978-3-319-10584-0_34.
Google Scholar
P. L. St-Charles, G. A. Bilodeau, R. Bergevin. SuB-SENSE: A universal change detection method with local adaptive sensitivity. IEEE Transactions on Image Processing, vol. 24, no. 1, pp. 359–373, 2015. DOI: https://doi.org/10.1109/TIP.2014.2378053.
MathSciNet MATH Google Scholar
C. R. Wren, A. Azarbayejani, T. Darrell, A. P. Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780–785, 1997. DOI: https://doi.org/10.1109/34.598236.
Google Scholar
Y. Y. Chen, J. Q. Wang, H. Q. Lu. Learning sharable models for robust background subtraction. In Proceedings of IEEE International Conference on Multimedia and Expo, Turin, Italy, 2015. DOI: https://doi.org/10.1109/ICME.2015.7177419.
S. C. Liao, G. Y. Zhao, V. Kellokumpu, M. Pietikäinen, S. Z. Li. Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, pp. 1301–1306, 2010. https://doi.org/10.1109/CVPR.2010.5539817.
Y. Goyat, T. Chateau, L. Malaterre, L. Trassoudaine. Vehicle trajectories evaluation by static video sensors. In Proceedings of IEEE Intelligent Transportation Systems Conference, Toronto, Canada, pp.864–869, 2006. DOI: https://doi.org/10.1109/ITSC.2006.1706852.
P. L. St-Charles, G. A. Bilodeau, R. Bergevin. A self-adjusting approach to change detection based on background word consensus. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, pp. 990–997, 2015. DOI: https://doi.org/10.1109/WACV.2015.137.
S. Q. Jiang, X. B. Lu. WeSamBE: A weight-sample-based method for background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 9, pp. 2105–2115, 2018. DOI: https://doi.org/10.1109/TCSVT.2017.2711659.
Google Scholar
S. M. Roy, A. Ghosh. Foreground segmentation using adaptive 3 phase background model. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 6, pp. 2287–2296, 2020. DOI: https://doi.org/10.1109/TITS.2019.2915568.
Google Scholar
L. Maddalena, A. Petrosino. A self-organizing approach to background subtraction for visual surveillance applications. IEEE Transactions on Image Processing, vol. 17, no. 7, pp. 1168–1177, 2008. DOI: https://doi.org/10.1109/TIP.2008.924285.
MathSciNet Google Scholar
D. Culibrk, O. Marques, D. Socek, H. Kalva, B. Furht. Neural network approach to background modeling for video object segmentation. IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1614–1627, 2007. DOI: https://doi.org/10.1109/TNN.2007.896861.
Google Scholar
L. Maddalena, A. Petrosino. Extracting a background image by a multi-modal scene background model. In Proceedings of the 23rd International Conference on Pattern Recognition, IEEE, Cancun, Mexico, pp. 143–148, 2016. DOI: https://doi.org/10.1109/ICPR.2016.7899623.
Google Scholar
M. Yu, Y. Z. Yu, A. Rhuma, S. M. R. Naqvi, L. Wang, J. A. Chambers. An online one class support vector machine-based person-specific fall detection system for monitoring an elderly individual in a room environment. IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 6, pp. 1002–1014, 2013. DOI: https://doi.org/10.1109/JBHI.2013.2274479.
Google Scholar
Z. Xu, B. Min, R. C. C. Cheung. A robust background initialization algorithm with superpixel motion detection. Signal Processing Image Communication, vol. 71, pp. 1–12, 2019. DOI: https://doi.org/10.1016/j.image.2018.07.004.
Google Scholar
N. M. Oliver, B. Rosario, A. P. Pentland. A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831–843, 2000. DOI: https://doi.org/10.1109/34.868684.
Google Scholar
E. J. Candès, X. Li, Y. Ma, J. Wright. Robust principal component analysis? Journal of the ACM, vol. 58, no. 3, Article number 11, 2011. DOI: https://doi.org/10.1145/1970392.1970395.
Google Scholar
J. Yao, J. M. Odobez. Multi-layer background subtraction based on color and texture. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007. DOI: https://doi.org/10.1109/CVPR.2007.383497.
A. B. Godbehere, A. Matsukawa, K. Goldberg. Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation. In Proceedings of the American Control Conference, IEEE, Montreal, Canada, pp. 4305–4312, 2012. DOI: https://doi.org/10.1109/ACC.2012.6315174.
Google Scholar
B. Laugraud, M. Van Droogenbroeck. Is a memoryless motion detection truly relevant for background generation with LaBGen? In Proceedings of the 18th International Conference on Advanced Concepts for Intelligent Vision Systems, Springer, Antwerp, Belgium, pp. 443–454, 2017. DOI: https://doi.org/10.1007/978-3-319-70353-4_38.
Google Scholar
S. H. Lee, G. C. Lee, J. Yoo, S. Kwon. WisenetMD: Motion detection using dynamic background region analysis. Symmetry, vol. 11, no. 5, Article number 621, 2019. DOI: https://doi.org/10.3390/sym11050621.
Google Scholar
S. Bianco, G. Ciocca, R. Schettini. Combination of video change detection algorithms by genetic programming. IEEE Transactions on Evolutionary Computation, vol. 21, no. 6, pp. 914–928, 2017. DOI: https://doi.org/10.1109/TEVC.2017.2694160.
Google Scholar
F. El Baf, T. Bouwmans, B. Vachon. Fuzzy integral for moving object detection. In Proceedings of IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence), Hong Kong, China, pp. 1729–1736, 2008. DOI: https://doi.org/10.1109/FUZZY.2008.4630604.
H. X. Zhang, D. Xu. Fusing color and texture features for background model. In Proceedings of the 3rd Fuzzy Systems and Knowledge Discovery, Springer, Xi’an, China, pp. 887–893, 2006. DOI: https://doi.org/10.1007/11881599_110.
Google Scholar
B. Xu, N. Y. Wang, T. Q. Chen, M. Li. Empirical evaluation of rectified activations in convolutional network. [Online], Available: http://arxiv.org/abs/1505.00853, 2015.
D. Misra. Mish: A self regularized non-monotonic activation function. In Proceedings of the 31st British Machine Vision Conference, Manchester UK, 2020.
B. Ding, H. M. Qian, J. Zhou. Activation functions and their characteristics in deep neural networks. In Proceedings of Chinese Control and Decision Conference, IEEE, Shenyang, China, pp. 1836–1841, 2018. DOI: https://doi.org/10.1109/CCDC.2018.8407425.
Google Scholar
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, USA, 2015. DOI: https://doi.org/10.48550/arXiv.1412.6980.
R. Y. Sun. Optimization for deep learning: An overview. Journal of the Operations Research Society of China, vol. 8, no. 2, pp. 249–294, 2020. DOI: https://doi.org/10.1007/s40305-020-00309-6.
MathSciNet MATH Google Scholar
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
MathSciNet MATH Google Scholar
S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 448–456, 2015.
S. Ioffe. Batch renormalization: Towards reducing mini-batch dependence in batch-normalized models. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 1942–1950, 2017.
J. Kukačka, V. Golkov, D. Cremers. Regularization for deep learning: A taxonomy. [Online], Available: https://arxiv.org/abs/1710.10686, 2017.
J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00745.
Google Scholar
S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, German, pp. 3–19, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_1.
Google Scholar
Z. Y. Niu, G. Q. Zhong, H. Yu. A review on the attention mechanism of deep learning. Neurocomputing, vol. 452, pp. 48–62, 2021. DOI: https://doi.org/10.1016/j.neucom.2021.03.091.
Google Scholar
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference for Learning Representations, 2021.
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248–255, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848.
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerlan, pp. 740–755, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_48.
Google Scholar
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. F. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Q. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Q. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. [Online], Available: https://arxiv.org/abs/1603.04467, 2016.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. M. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. J. Bai, S. Chintala. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 721, 2019.
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, vol. 11, no. 12, pp. 3371–3408, 2010.
MathSciNet MATH Google Scholar
D. P. Kingma, M. Welling. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, 2014. DOI: https://doi.org/10.48550/arXiv.1312.6114.
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2672–2680, 2014.
K. F. Wang, C. Gou, Y. J. Duan, Y. L. Lin, X. H. Zheng, F. Y. Wang. Generative adversarial networks: Introduction and outlook. IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 4, pp. 588–598, 2017. DOI: https://doi.org/10.1109/JAS.2017.7510583.
MathSciNet Google Scholar
J. Gui, Z. N. Sun, Y. G. Wen, D. C. Tao, J. P. Ye. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Transactions on Knowledge and Data Engineering, to be published.
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representation, San Diego, USA, 2015. DOI: https://doi.org/10.48550/arXiv.1409.1556.
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
Google Scholar
J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 3431–3440, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965.
C. Z. Wu, J. Sun, J. Wang, L. F. Xu, S. Zhan. Encoding-decoding network with pyramid self-attention module for retinal vessel segmentation. International Journal of Automation and Computing, vol. 18, no. 6, pp. 973–980, 2021. DOI: https://doi.org/10.1007/s11633-020-1277-0.
Google Scholar
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2018. DOI: https://doi.org/10.1109/TPAMI.2017.2699184.
Google Scholar
F. Yu, V. Koltun. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016. DOI: https://doi.org/10.48550/arxiv.org/abs/1511.07122.
H. Noh, S. Hong, B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1520–1528, 2015. DOI: https://doi.org/10.1109/ICCV.2015.178.
V. Badrinarayanan, A. Kendall, R. Cipolla. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2644615.
Google Scholar
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
L. C. Chen, G. Papandreou, F. Schroff, H. Adam. Re-thinking atrous convolution for semantic image segmentation. [Online], Available: https://arxiv.org/abs/1706.05587, 2017.
L. C. Chen, Y. K. Zhu, G. Papandreou, F. Schroff, H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 833–851, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_49.
Google Scholar
O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, Munich, Germany, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.
Google Scholar
I. B. Senkyire, Z. Liu. Supervised and semi-supervised methods for abdominal organ segmentation: A review. International Journal of Automation and Computing, vol. 18, no. 6, pp. 887–914, 2021. DOI: https://doi.org/10.1007/s11633-021-1313-0.
Google Scholar
K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.322.
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.
Google Scholar
Z. C. Lipton, J. Berkowitz, C. Elkan. A critical review of recurrent neural networks for sequence learning. [Online], Available: https://arxiv.org/abs/1506.00019, 2015.
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.
Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
J. Zhou, G. Q. Cui, S. D. Hu, Z. Y. Zhang, C. Yang, Z. Y. Liu, L. F. Wang, C. C. Li, M. S. Sun. Graph neural networks: A review of methods and applications. AI Open, vol. 1, pp. 57–81, 2020. DOI: https://doi.org/10.1016/j.aiopen.2021.01.001.
Google Scholar
J. Gracewell, M. John. Dynamic background modeling using deep learning autoencoder network. Multimedia Tools and Applications, vol. 79, no. 7, pp. 4639–4659, 2020. DOI: https://doi.org/10.1007/s11042-019-7411-0.
Google Scholar
P. Xu, M. Ye, Q. H. Liu, X. D. Li, L. S. Pei, J. Ding. Motion detection via a couple of auto-encoder networks. In Proceedings of IEEE International Conference on Multimedia and Expo, Chengdu, China, 2014. DOI: https://doi.org/10.1109/ICME.2014.6890140.
P. Xu, M. Ye, X. Li, Q. H. Liu, Y. Yang, J. Ding. Dynamic background learning through deep auto-encoder networks. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, USA, pp. 107–116, 2014. DOI: https://doi.org/10.1145/2647868.2654914.
B. Rezaei, F. Amirreza, S. Ostadabbas. DeepPBM: Deep probabilistic background model estimation from video sequences. In Proceedings of the International Conference on Pattern Recognition, Springer, pp.608–621, 2021. DOI: https://doi.org/10.1007/978-3-030-68790-8_47.
A. Vacavant, T. Chateau, A. Wilhelm, L. Lequièvre. A benchmark dataset for outdoor foreground/background extraction. In Proceedings of the Asian Conference on Computer Vision, Springer, Daejeon, Republic of Korea, pp. 291–300, 2013. DOI: https://doi.org/10.1007/978-3-642-37410-4_25.
Google Scholar
B. Rezaei, A. Farnoosh, S. Ostadabbas. G-LBM: Generative low-dimensional background model estimation from video sequences. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK., pp. 293–310, 2020. DOI: https://doi.org/10.1007/978-3-030-58610-2_18.
Google Scholar
P. M. Jodoin, L. Maddalena, A. Petrosino, Y. Wang.z Extensive benchmark and survey of modeling methods for scene background initialization. IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5244–5256, 2017. DOI: https://doi.org/10.1109/TIP.2017.2728181.
MathSciNet MATH Google Scholar
S. Javed, A. Mahmood, T. Bouwmans, S. K. Jung. Background-foreground modeling based on spatiotemporal sparse subspace clustering. IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5840–5854, 2017. DOI: https://doi.org/10.1109/TIP.2017.2746268.
MathSciNet Google Scholar
I. Halfaoui, F. Bouzaraa, O. Urfalioglu. CNN-based initial background estimation. In Proceedings of the 23rd International Conference on Pattern Recognition, IEEE, Cancun, Mexico, pp. 101–106, 2016. DOI: https://doi.org/10.1109/ICPR.2016.7899616.
Google Scholar
I. Ul Haq, T. Iwata, Y. Kawahara. Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos. Computer Vision and Image Understanding, vol. 216, Article number 103355, 2022. DOI: https://doi.org/10.1016/j.cviu.2021.103355.
K. Toyama, J. Krumm, B. Brumitt, B. Meyers. Wallflower: Principles and practice of background maintenance. In Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, pp. 255–261, 1999. DOI: https://doi.org/10.1109/ICCV.1999.791228.
P. J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics, vol. 656, pp. 5–28, 2010. DOI: https://doi.org/10.1017/S0022112010001217.
MathSciNet MATH Google Scholar
A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbas, V. Golkov, P. van der Smagt, D. Cremers, T. Brox. FlowNet: Learning optical flow with convolutional networks. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 2758–2766, 2015. DOI: https://doi.org/10.1109/ICCV.2015.316.
M. Sultana, A. Mahmood, S. Javed, S. K. Jung. Unsupervised deep context prediction for background estimation and foreground segmentation. Machine Vision and Applications, vol. 30, no. 3, pp. 375–395, 2019. DOI: https://doi.org/10.1007/s00138-018-0993-0.
Google Scholar
M. Sultana, A. Mahmood, S. K. Jung. Unsupervised moving object detection in complex scenes using adversarial regularizations. IEEE Transactions on Multimedia, vol. 23, pp. 2005–2018, 2021. DOI: https://doi.org/10.1109/TMM.2020.3006419.
Google Scholar
L. Li, W. Huang, I. Y. H. Gu, Q. Tian. Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing, vol. 13, no. 11, pp. 1459–1472, 2004. DOI: https://doi.org/10.1109/TIP.2004.836169.
Google Scholar
Z. F. Zhu, Y. Y. Meng, D. Q. Kong, X. X. Zhang, Y. D. Guo, Y. Zhao. To see in the dark: N2DGAN for background modeling in nighttime scene. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 2, pp. 492–502, 2021. DOI: https://doi.org/10.1109/TCSVT.2020.2987874.
Google Scholar
M. Braham, M. Van Droogenbroeck. Deep background subtraction with scene-specific convolutional neural networks. In Proceedings of the International Conference on Systems, Signals and Image Processing, IEEE, Bratislava, Slovakia, 2016. DOI: https://doi.org/10.1109/IWSSIP.2016.7502717.
Google Scholar
Y. Wang, Z. M. Luo, P. M. Jodoin. Interactive deep learning method for segmenting moving objects. Pattern Recognition Letters, vol. 96, pp. 66–75, 2017. DOI: https://doi.org/10.1016/j.patrec.2016.09.014.
Google Scholar
M. Babaee, D. T. Dinh, G. Rigoll. A deep convolutional neural network for video sequence background subtraction. Pattern Recognition, vol. 76, pp.635–649, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.09.040.
Google Scholar
L. A. Lim, H. Y. Keles. Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recognition Letters, vol. 112, pp. 256–262, 2018. DOI: https://doi.org/10.1016/j.patrec.2018.08.002.
Google Scholar
L. A. Lim, H. Y. Keles. Learning multi-scale features for foreground segmentation. Pattern Analysis and Applications, vol. 23, no. 3, pp. 1369–1380, 2020. DOI: https://doi.org/10.1007/s10044-019-00845-9.
Google Scholar
M. O. Tezcan, P. Ishwar, J. Konrad. BSUV-Net: A fully-convolutional neural network for background subtraction of unseen videos. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, pp. 2774–2783, 2020. DOI: https://doi.org/10.1109/WACV45572.2020.9093464.
D. D. Zeng, M. Zhu, A. Kuijper. Combining background subtraction algorithms with convolutional neural network. Journal of Electronic Imaging, vol. 28, no. 1, Article number 013011, 2019. DOI: https://doi.org/10.1117/1.JEI.28.1.013011.
Google Scholar
R. Wang, F. Bunyak, G. Seetharaman, K. Palaniappan. Static and moving object detection using flux tensor with split Gaussian models. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, Columbus, USA, pp. 414–418, 2014. DOI: https://doi.org/10.1109/CVPRW.2014.68.
M. De Gregorio, M. Giordano. CWISARDH.+: Background detection in RGBD videos by learning of weightless neural networks. In Proceedings of International Conference on Image Analysis and Processing, Springer, Catania, Italy, pp. 242–253, 2017. DOI: https://doi.org/10.1007/978-3-319-70742-6_23.
Google Scholar
G. Rahmon, F. Bunyak, G. Seetharaman, K. Palaniappan. Motion U-Net: Multi-cue encoder-decoder network for motion segmentation. In Proceedings of the 25th International Conference on Pattern Recognition, IEEE, Milan, Italy, pp. 8125–8132, 2020. DOI: https://doi.org/10.1109/ICPR48806.2021.9413211.
Google Scholar
F. Bunyak, K. Palaniappan, S. K. Nath, G. Seetharaman. Flux tensor constrained geodesic active contours with sensor fusion for persistent object tracking. Journal of Multimedia, vol. 2, no. 4, pp. 20–33, 2007. DOI: https://doi.org/10.4304/jmm.2.4.20-33.
Google Scholar
Z. Zivkovic, F. van der Heijden. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, vol. 27, no. 7, pp. 773–780, 2006. DOI: https://doi.org/10.1016/j.patrec.2005.11.005.
Google Scholar
Z. Zivkovic. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, IEEE, Cambridge, UK, pp. 28–31, 2004. DOI: https://doi.org/10.1109/ICPR.2004.1333992.
Google Scholar
L. Maddalena, A. Petrosino. Towards benchmarking scene background initialization. In Proceedings of International Conference on Image Analysis and Processing, Springer, Genoa, Italy, pp.469–476, 2015. DOI: https://doi.org/10.1007/978-3-319-23222-5_57.
Google Scholar
W. B. Zheng, K. F. Wang, F. Y. Wang. A novel background subtraction algorithm based on parallel vision and Bayesian GANs. Neurocomputing, vol. 394, pp. 178–200, 2020. DOI: https://doi.org/10.1016/j.neucom.2019.04.088.
Google Scholar
M. Braham, S. Piérard, M. Van Droogenbroeck. Semantic background subtraction. In Proceedings of IEEE International Conference on Image Processing, Beijing, China, pp. 4552–4556, 2017. DOI: https://doi.org/10.1109/ICIP.2017.8297144.
S. Isik, K. Özkan, S. Günal, Ö. N. Gerek. SWCD: A sliding window and self-regulated learning-based background updating method for change detection in videos. Journal of Electronic Imaging, vol. 27, no. 2, Article number 23002, 2018. DOI: https://doi.org/10.1117/1.JEI.27.2.023002.
Google Scholar
T. Minematsu, A. Shimada, R. I. Taniguchi. Simple background subtraction constraint for weakly supervised background subtraction network. In Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, 2019. DOI: https://doi.org/10.1109/AVSS.2019.8909896.
M. Vijayan, P. Raguraman, R. Mohan. A fully residual convolutional neural network for background subtraction. Pattern Recognition Letters, vol. 146, pp.63–69, 2021. DOI: https://doi.org/10.1016/j.patrec.2021.02.017.
Google Scholar
Y. Z. Yang, J. H. Ruan, Y. Q. Zhang, X. Cheng, Z. Zhang, G. J. Xie. STPNet: A spatial-temporal propagation network for background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2145–2157, 2022. DOI: https://doi.org/10.1109/TCSVT.2021.3088130.
Google Scholar
C. Cuevas, E. M. Yáñez, N. García. Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA. Computer Vision and Image Understanding, vol. 152, pp. 103–117, 2016. DOI: https://doi.org/10.1016/j.cviu.2016.08.005.
Google Scholar
T. Akilan, Q. J. Wu, A. Safaei, J. Huo, Y. M. Yang. A 3D CNN-LSTM-based image-to-image foreground segmentation. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 3, pp. 959–971, 2020. DOI: https://doi.org/10.1109/TITS.2019.2900426.
Google Scholar
Y. Wang, Z. J. Yu, L. Q. Zhu. Foreground detection with deeply learned multi-scale spatial-temporal features. Sensors, vol. 18, no. 12, Article number 4269, 2018. DOI: https://doi.org/10.3390/s18124269.
Google Scholar
Y. Y. Chen, J. Q. Wang, B. K. Zhu, M. Tang, H. Q. Lu. Pixelwise deep sequence learning for moving object detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 9, pp. 2567–2579, 2019. DOI: https://doi.org/10.1109/TCSVT.2017.2770319.
Google Scholar
D. D. Zeng, X. Chen, M. Zhu, M. Goesele, A. Kuijper. Background subtraction with real-time semantic segmentation. IEEE Access, vol. 7, pp. 153869–153884, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2899348.
Google Scholar
P. W. Patil, S. Murala. MSFgNet: A novel compact end-to-end deep network for moving object detection. IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 11, pp.4066–4077, 2019. DOI: https://doi.org/10.1109/TITS.2018.2880096.
Google Scholar
V. M. Mondéjar-Guerra, J. Rouco, J. Novo, M. Ortega. An end-to-end deep learning approach for simultaneous background modeling and subtraction. In Proceedings of the 30th British Machine Vision Conference, Cardiff, UK, pp. 266–277, 2019.
W. J. Kim, S. Hwang, J. Lee, S. Woo, S. Lee. AIBM: Accurate and instant background modeling for moving object detection. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 9021–9036, 2022. DOI: https://doi.org/10.1109/TITS.2021.3090092.
Google Scholar
D. Liang, Z. Q. Wei, H. Sun, H. Y. Zhou. Robust cross-scene foreground segmentation in surveillance video. In Proceedings of IEEE International Conference on Multimedia and Expo, Shenzhen, China, 2021. DOI: https://doi.org/10.1109/ICME51207.2021.9428086.
University Kyushu. LIMU, 2008, [Online], Available: https://limu.ait.kyushu-u.ac.jp/dataset/en/, 2022.
J. Zhang, Y. Li, F. Q. Chen, Z. S. Pan, X. Y. Zhou, Y. D. Li, S. S. Jiao. X-Net: A binocular summation network for foreground segmentation. IEEE Access, vol. 7, pp. 71412–71422, 2019. DOI: https://doi.org/10.1109/ACCESS.2019.2919802.
Google Scholar
J. Zhang, S. H. Wang, J. Y. Qiu, X. R. Pan, J. H. Zou, Y. X. Duan, Z. S. Pan, Y. Li. A fast X-shaped foreground segmentation network with CompactASPP. Engineering Applications of Artificial Intelligence, vol. 97, Article number 104077, 2021. DOI: https://doi.org/10.1016/j.engappai.2020.104077.
J. Zhang, X. Zhang, Y. Y. Zhang, Y. X. Duan, Y. Li, Z. S. Pan. Meta-knowledge learning and domain adaptation for unseen background subtraction. IEEE Transactions on Image Processing, vol. 30, pp. 9058–9068, 2021. DOI: https://doi.org/10.1109/TIP.2021.3122102.
Google Scholar
M. Mandal, V. Dhar, A. Mishra, S. K. Vipparthi. 3DFR: A swift 3D feature reductionist framework for scene independent change detection. IEEE Signal Processing Letters, vol. 26, no. 12, pp. 1882–1886, 2019. DOI: https://doi.org/10.1109/LSP.2019.2952253.
Google Scholar
Z. J. Zou, Z. T. Meng, L. Shu, J. Hao. A change-aware approach for relative motion segmentation. In Proceedings of IEEE International Conference on Multimedia and Expo, Shenzhen, China, 2021. DOI: https://doi.org/10.1109/ICME51207.2021.9428082.
W. B. Zheng, K. F. Wang, F. Y. Wang. Background subtraction algorithm with Bayesian generative adversarial networks. Acta Automatica Sinica, vol. 44, no. 5, pp. 878–890, 2018. DOI: https://doi.org/10.16383/j.aas.2018.cl70562. (in Chinese)
Google Scholar
C. Q. Zhao, A. Basu. Dynamic deep pixel distribution learning for background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 11, pp.4192–4206, 2020. DOI: https://doi.org/10.1109/TCSVT.2019.2951778.
Google Scholar
Z. H. Hu, T. Turki, N. Phan, J. T. L. Wang. A 3D atrous convolutional long short-term memory network for background subtraction. IEEE Access, vol. 6, pp. 43450–43459, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2861223.
Google Scholar
M. Mandal, S. K. Vipparthi. Scene independency matters: An empirical study of scene dependent and scene independent evaluation for CNN-based change detection. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 3, pp. 2031–2044, 2022. DOI: https://doi.org/10.1109/TITS.2020.3030801.
Google Scholar
M. Mandal, V. Dhar, A. Mishra, S. K. Vipparthi, M. Abdel-Mottaleb. 3DCD: Scene independent end-to-end spatiotemporal feature learning framework for change detection in unseen videos. IEEE Transactions on Image Processing, vol. 30, pp. 546–558, 2021. DOI: https://doi.org/10.1109/TIP.2020.3037472.
Google Scholar
B. X. Hou, Y. Liu, N. M. Ling, L. Z. Liu, Y. X. Ren. A fast lightweight 3D separable convolutional neural network with multi-input multi-output for moving object detection. IEEE Access, vol. 9, pp. 148433–148448, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3123975.
Google Scholar
F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, A. Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 724–732, 2016. DOI: https://doi.org/10.1109/CVPR.2016.85.
S. Choo, W. Seo, D. J. Jeong, N. I. Cho. Multi-scale recurrent encoder-decoder network for dense temporal classification. In Proceedings of 24th International Conference on Pattern Recognition, IEEE, Beijing, China, pp. 103–108, 2018. DOI: https://doi.org/10.1109/ICPR.2018.8545597.
Google Scholar
S. Choo, W. Seo, D. J. Jeong, N. I. Cho. Learning background subtraction by video synthesis and multi-scale recurrent networks. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 357–372, 2019. DOI: https://doi.org/10.1007/978-3-030-20876-9_23.
Google Scholar
L. Yang, J. Li, Y. S. Luo, Y. Zhao, H. Cheng, J. Li. Deep background modeling using fully convolutional network. IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 254–262, 2018. DOI: https://doi.org/10.1109/TITS.2017.2754099.
Google Scholar
P. W. Patil, A. Dudhane, S. Murala. Multi-frame recurrent adversarial network for moving object segmentation. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, pp. 2301–2310, 2021. DOI: https://doi.org/10.1109/WACV48630.2021.00235.
P. Isola, J. Y. Zhu, T. H. Zhou, A. A. Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 1125–1134, 2017. DOI: https://doi.org/10.1109/CVPR.2017.632.
G. M. Shi, T. Huang, W. S. Dong, J. J. Wu, X. M. Xie. Robust foreground estimation via structured Gaussian scale mixture modeling. IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 4810–4824, 2018. DOI: https://doi.org/10.1109/TIP.2018.2845123.
MathSciNet MATH Google Scholar
S. Javed, A. Mahmood, S. Al-Maadeed, T. Bouwmans, S. K. Jung. Moving object detection in complex scene using spatiotemporal structured-sparse RPCA. IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 1007–1022, 2019. DOI: https://doi.org/10.1109/TIP.2018.2874289.
MathSciNet MATH Google Scholar
T. Akilan, Q. M. J. Wu. sEnDec: An improved image to image CNN for foreground localization. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 10, pp. 4435–4443, 2020. DOI: https://doi.org/10.1109/TITS.2019.2940547.
Google Scholar
T. Akilan, Q. M. J. Wu, W. D. Zhang. Video foreground extraction using multi-view receptive field and encoder-decoder DCNN for traffic and surveillance applications. IEEE Transactions on Vehicular Technology, vol. 68, no. 10, pp. 9478–9493, 2019. DOI: https://doi.org/10.1109/TVT.2019.2937076.
Google Scholar
P. W. Patil, A. Dudhane, S. Chaudhary, S. Murala. Multi-frame based adversarial learning approach for video surveillance. Pattern Recognition, vol. 122, Article number 108350, 2022. DOI: https://doi.org/10.1016/j.patcog.2021.108350.
C. L. Li, X. Wang, L. Zhang, J. Tang, H. J. Wu, L. Lin. Weighted low-rank decomposition for robust grayscale-thermal foreground detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 4, pp. 725–738, 2017. DOI: https://doi.org/10.1109/TCSVT.2016.2556586.
Google Scholar
H. W. Yong, D. Y. Meng, W. M. Zuo, L. Zhang. Robust online matrix factorization for dynamic background subtraction. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 7, pp. 1726–1740, 2018. DOI: https://doi.org/10.1109/TPAMI.2017.2732350.
Google Scholar
L. Chen, X. Jiang, X. Z. Liu, T. Kirubarajan, Z. X. Zhou. Outlier-robust moving object and background decomposition via structured ℓ_p-regularized low-rank representation. IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 4, pp.620–638, 2021. DOI: https://doi.org/10.1109/TETCI.2019.2935747.
Google Scholar
P. W. Patil, A. Dudhane, S. Murala. End-to-End recurrent generative adversarial network for traffic and surveillance applications. IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14550–14562, 2020. DOI: https://doi.org/10.1109/TVT.2020.3043575.
Google Scholar
L. Maddalena, A. Petrosino. The SOBS algorithm: What are the limits? In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, pp. 21–26, 2012. DOI: https://doi.org/10.1109/CVPRW.2012.6238922.
T. S. F. Haines, T. Xiang. Background subtraction with DirichletProcess mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 670–683, 2014. DOI: https://doi.org/10.1109/TPAMI.2013.239.
Google Scholar
D. Berjón, C. Cuevas, F. Morán, N. García. Real-time nonparametric background subtraction with tracking-based foreground update. Pattern Recognition, vol. 74, pp. 156–170, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.09.009.
Google Scholar
P. W. Patil, A. Dudhane, S. Murala, A. B. Gonde. Deep adversarial network for scene independent moving object segmentation. IEEE Signal Processing Letters, vol. 28, pp.489–493, 2021. DOI: https://doi.org/10.1109/LSP.2021.3059195.
Google Scholar
P. W. Patil, K. M. Biradar, A. Dudhane, S. Murala. An end-to-end edge aggregation network for moving object segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8146–8155, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00817.
Google Scholar
F. X. Li, T. Kim, A. Humayun, D. Tsai, J. M. Rehg. Video segmentation by tracking many figure-ground segments. In Proceedings of EEE International Conference on Computer Vision, Sydney, Australia, pp. 2192–2199, 2013. DOI: https://doi.org/10.1109/ICCV.2013.273.
I. Osman, M. Abdelpakey, M. S. Shehata. TransBlast: Self-supervised learning using augmented subspace with Transformer for background/foreground separation. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 215–224, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00029.
J. Zbontar, L. Jing, I. Misra, Y. LeCun, S. Deny. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of 38th International Conference on Machine Learning, pp. 12310–12320, 2021.
J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, L. Van Gool. The 2017 DAVIS challenge on video object segmentation. [Online], Available: https://arxiv.org/abs/1704.00675, 2017.
J. Zhang, Y. Li, C. L. Ren, L. Huang, S. H. Wang, Y. X. Duan, Z. S. Pan, J. Xie. Cross-scene foreground segmentation algorithm based on high-level feature differencing between frames. Acta Electronica Sinica, vol. 49, no. 10, pp. 2032–2040, 2021. DOI: https://doi.org/10.12263/DZXB.20200620. (in Chinese)
Google Scholar
J. H. Giraldo, S. Javed, T. Bouwmans. Graph moving object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2485–2503, 2022. DOI: https://doi.org/10.1109/TPAMI.2020.3042093.
Google Scholar
J. H. Giraldo, S. Javed, N. Werghi, T. Bouwmans. Graph CNN for moving object detection in complex environments from unseen videos. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 225–233, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00030.
Google Scholar
V. Mahadevan, N. Vasconcelos. Spatiotemporal saliency in dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 171–177, 2010. DOI: https://doi.org/10.1109/TPAMI.2009.112.
Google Scholar
X. W. Zhou, C. Yang, W. C. Yu. Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, 2013. DOI: https://doi.org/10.1109/TPAMI.2012.132.
Google Scholar
J. He, L. Balzano, A. Szlam. Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, pp. 1568–1575, 2012. DOI: https://doi.org/10.1109/CVPR.2012.6247848.
H. S. Zhao, X. J. Qi, X. Y. Shen, J. P. Shi, J. Y. Jia. ICNet for real-time semantic segmentation on high-resolution images. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 418–434, 2018. DOI: https://doi.org/10.1007/978-3-030-01219-9_25.
Google Scholar
H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, J. Y. Jia. Pyramid scene parsing network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6230–6239, 2017. DOI: https://doi.org/10.1109/CV-PR.2017.660.
A. Guzman-Pando, M. I. Chacon-Murguia. DeepFoveaNet: Deep fovea eagle-eye bioinspired model to detect moving objects. IEEE Transactions on Image Processing, vol. 30, pp. 7090–7100, 2021. DOI: https://doi.org/10.1109/TIP.2021.3101398.
Google Scholar
Y. X. Ge, J. Y. Zhang, X. Y. Ren, C. Q. Zhao, J. Yang, A. Basu. Deep variation transformation network for foreground detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 9, pp. 3544–3558, 2021. DOI: https://doi.org/10.1109/TCSVT.2020.3042559.
Google Scholar
W. J. Zhou, S. Kaneko, M. Hashimoto, Y. Satoh, D. Liang. Foreground detection based on co-occurrence background model with hypothesis on degradation modification in dynamic scenes. Signal Processing, vol. 160, pp. 66–79, 2019. DOI: https://doi.org/10.1016/j.sigpro.2019.02.021.
Google Scholar
D. Liang, B. Kang, X. Y. Liu, P. Gao, X. Y. Tan, S. Kaneko. Cross-scene foreground segmentation with supervised and unsupervised model communication. Pattern Recognition, vol. 117, Article number 107995, 2021. DOI: https://doi.org/10.1016/j.patcog.2021.107995.
D. Liang, J. X. Pan, H. Sun, H. Y. Zhou. Spatio-temporal attention model for foreground detection in cross-scene surveillance videos. Sensors, vol. 19, no. 23, Article number 5142, 2019. DOI: https://doi.org/10.3390/s19235142.
Google Scholar
D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabally, C. Quek. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey. IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 8, pp. 1993–2016, 2017. DOI: 1109/TITS.2016.2634580.
Google Scholar
R. Huang, M. Zhou, Y. Xing, Y. B. Zou, W. Fan. Change detection with various combinations of fluid pyramid integration networks. Neurocomputing, vol. 437, pp. 84–94, 2021. DOI: https://doi.org/10.1016/j.neucom.2021.01.030.
Google Scholar
S. C. Li, P. C. Han, S. H. Bu, P. M. Tong, Q. Li, K. Li, G. Wan. Change detection in images using shape-aware siamese convolutional network. Engineering Applications of Artificial Intelligence, vol. 94, Article number 103819, 2020. DOI: https://doi.org/10.1016/j.engappai.2020.103819.
T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, E. H. Zahzah. Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset. Computer Science Review, vol. 23, pp. 1–71, 2017. DOI: https://doi.org/10.1016/j.cosrev.2016.11.001.
MATH Google Scholar
V. Monga, Y. L. Li, Y. C. Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 18–44, 2021. DOI: https://doi.org/10.1109/MSP.2020.3016905.
Google Scholar
A. Sobral. BGSLibrary: An OpenCV C++ background subtraction library. In Proceedings of IX Workshop de Visão Computacional, Rio de Janeiro, Brazil, vol. 27, 2013. DOI: https://doi.org/10.13140/2.1.1740.7044.
A. Sobral, T. Bouwmans, E. H. Zahzah. LRSLibrary: Low-rank and sparse tools for background modeling and subtraction in videos. Handbook of Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing, T. Bouwmans, N. S. Aybat, E. H. Zahzah, Eds., Boca Raton, USA: CRC Press, pp. 14-1–14-11, 2016.
Google Scholar

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Nos. 61702323 and 62172268), the Shanghai Municipal Natural Science Foundation, China (No. 20ZR1423100), the Open Fund of Science and Technology on Thermal Energy and Power Laboratory (No. TPL2020C02), Wuhan 2nd Ship Design and Research Institute, Wuhan, China, the National Key Research and Development Program of China (No. 2018YFB1306303), and the Major Basic Research Projects of Natural Science Foundation of Shandong Province, China (No. ZR2019ZD07). The authors are grateful to the reviewers for their valuable comments that have improved the quality of this paper. Rui Jiang is grateful to Mrs. Xia-Hong Xu for her support to this research.

Author information

Authors and Affiliations

College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
Rui Jiang & Ruixiang Zhu
School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
Rui Jiang & Yuan Xie
Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Hu Su & Wei Zou
State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Yinlin Li

Authors

Rui Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ruixiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hu Su
View author publications
You can also search for this author in PubMed Google Scholar
Yinlin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hu Su.

Additional information

Rui Jiang received the B.Sc. degree in computational mathematics from Xi’an University of Technology (XUT), China in 2006, the M. Sc. degree in applied mathematics from the Xi’an Jiaotong University (XJTU), China in 2010, and the Ph. D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2016. She is now an assistant professor with College of Information Engineering, Shanghai Maritime University (SMU), China, and a visiting scholar with School of Computer Science and Technology, East China Normal University, China. She is a member of IEEE.

Her research interests include machine learning, pattern recognition and computer vision.

Ruixiang Zhu received the B.Eng. degree in software engineering from Shandong University of Technology (SDUT), China in 2020. He is currently a master student in computer technology with College of Information Engineering, Shanghai Maritime University (SMU), China.

His research interests include deep learning and computer vision.

Hu Su received the B.Sc. and M.Sc. degrees in information and computation science from Shandong University (SDU), China in 2007 and 2010, respectively, and the Ph. D. degree in control science and engineering from State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2013. Currently, he is an associate researcher with Research Center of Precision Sensing and Control, CASIA, China.

His research interests include intelligent control and optimization, and computer vision.

Yinlin Li received the B.Sc. degree in measurement and control technology and instrumentation from Xidian University, China in 2011, and the Ph.D. degree in pattern recognition and intelligent system from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2016. She is currently an associate professor with State Key Laboratory for Management and Control of Complex Systems, CASIA, China.

Her research interests include robotic vision, biologically inspired visual algorithms and robotic manipulation.

Yuan Xie received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2013. He is currently a full professor with School of Computer Science and Technology, East China Normal University, China. He has published around 80 papers in major international journals and conferences, including IJCV, TPAMI, TIP, CVPR, ECCV, IC-CV, ICML, NeurIPS, AAAI, IJCAI. He received the Hong Kong Scholar Award from the Society of Hong Kong Scholars and the China National Postdoctoral Council in 2014.

His research interests include image processing, computer vision, machine learning and pattern recognition.

Wei Zou received the B. Sc. degree in control theory and control engineering from Inner Mongolia University of Science and Technology, China in 1997, the M.Sc. degree in control theory and control engineering from Shandong University, China in 2000, and the Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2003. Since 2012, he has been a researcher with Research Center of Precision Sensing and Control, CASIA, China.

His research interests include visual control and intelligent robots.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, R., Zhu, R., Su, H. et al. Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects. Mach. Intell. Res. 20, 335–369 (2023). https://doi.org/10.1007/s11633-022-1378-4

Download citation

Received: 12 June 2022
Accepted: 27 September 2022
Published: 20 April 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11633-022-1378-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects

Abstract

Access this article

Similar content being viewed by others

A systematic review of deep learning frameworks for moving object segmentation

Deep learning for video object segmentation: a review

Multi-scale Deep Feature Transfer for Automatic Video Object Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Learning-based Moving Object Segmentation: Recent Progress and Research Prospects

Abstract

Access this article

Similar content being viewed by others

A systematic review of deep learning frameworks for moving object segmentation

Deep learning for video object segmentation: a review

Multi-scale Deep Feature Transfer for Automatic Video Object Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation