End-to-end video background subtraction with 3d convolutional neural networks

Sakkos, Dimitrios; Liu, Heng; Han, Jungong; Shao, Ling

doi:10.1007/s11042-017-5460-9

End-to-end video background subtraction with 3d convolutional neural networks

Published: 11 December 2017

Volume 77, pages 23023–23041, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dimitrios Sakkos ORCID: orcid.org/0000-0002-2382-8244¹,
Heng Liu²,
Jungong Han³ &
…
Ling Shao⁴

2361 Accesses
71 Citations
Explore all metrics

Abstract

Background subtraction in videos is a highly challenging task by definition, as it lays on a pixel-wise classification level. Therefore, great attention to detail is essential. In this paper, we follow the success of Deep Learning in Computer Vision and present an end-to-end system for background subtraction in videos. Our model is able to track temporal changes in a video sequence by applying 3D convolutions to the most recent frames of the video. Thus, no background model is needed to be retained and updated. In addition, it can handle multiple scenes without further fine-tuning on each scene individually. We evaluate our system on the largest dataset for change detection, CDnet, with over 50 videos which span across 11 categories. Further evaluation is performed in the ESI dataset which features extreme and sudden illumination changes. Our model surpasses the state-of-the-art on both datasets according to the average ranking of the models over a wide range of metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

References

Allebosch G, Van Hamme D, Deboeverie F, Veelaert P, Philips W (2016) C-EFIC: Color and Edge Based Foreground Background Segmentation with Interior Classification. Springer International Publishing, Cham, pp 433–454. https://doi.org/10.1007/978-3-319-29971-6_23
Google Scholar
Babaee M, Dinh DT, Rigoll G (2017) A deep convolutional neural network for background subtraction. CoRR arXiv:1702.01731
Barnich O, Van Droogenbroeck M (2011) ViBe: A universal background subtraction algorithm for video sequences. IEEE Trans Image Process 20(6):1709–1724. https://doi.org/10.1109/TIP.2010.2101613
Article MathSciNet MATH Google Scholar
Bianco S, Ciocca G, Schettini R (2015) How far can you get by combining change detection algorithms? International Conference on Image Analysis and Processing (ICIPA), LNCS, Vol. 10484, pp 96–107, 2017
Bouwmans T, Zahzah EH (2014) Robust PCA via Principal Component Pursuit: A review for a comparative evaluation in video surveillance. Comput Vis Image Underst 122:22–34. https://doi.org/10.1016/j.cviu.2013.11.009
Article Google Scholar
Braham M, Van Droogenbroeck M (2016) Deep background subtraction with scene-specific convolutional neural networks. International Conference on Systems, Signals and Image Processing. https://doi.org/10.1109/IWSSIP.2016.7502717
Brutzer S, Hoferlin B, Heidemann G (2011) Evaluation of background subtraction techniques for video surveillance. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1937–1944. https://doi.org/10.1109/CVPR.2011.5995508
Caelles S, Maninis K, Pont-Tuset J, Leal-Taixe L, Cremers D, Gool LV (2016) One-shot video object segmentation. CoRR arXiv:1611.05198
Candes EJ, Li X, Ma Y, Wring J (2011) Robust principal component analysis?. J Assoc Comput Mach 53(3):3179–213. https://doi.org/10.1162/neco.2009.02-08-706. http://www.ncbi.nlm.nih.gov/pubmed/22481823
MathSciNet Google Scholar
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR arXiv:1412.7062
Chen Y, Wang J, Lu H (2015) Learning sharable models for robust background subtraction. In: Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2015-August. https://doi.org/10.1109/ICME.2015.7177419
Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415. https://doi.org/10.1109/TGRS.2016.2601622
Article Google Scholar
Cheng G, Han J, Lu X (2017) Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc IEEE 105:1865–883. https://doi.org/10.1109/JPROC.2017.2675998
Article Google Scholar
Cheng G, Li Z, Yao X, Guo L, Wei Z (2017) Remote sensing image scene classification using bag of convolutional features. IEEE Geosci Remote Sens Lett 14:1735–1739. https://doi.org/10.1109/LGRS.2017.2731997
Article Google Scholar
Cheung SCS, Kamath C (2005) Robust background subtraction with foreground validation for urban traffic video. Eurasip J Appl Signal Process 2005(14):2330–2340. https://doi.org/10.1155/ASP.2005.2330
MATH Google Scholar
Eigen D, Krishnan D, Fergus R (2013) Restoring an image taken through a window covered with dirt or rain. In: Proceedings of the 2013 IEEE International Conference on Computer Vision, ICCV ’13. IEEE Computer Society, Washington, pp 633–640. https://doi.org/10.1109/ICCV.2013.84
Elgammal A, Harwood D, Davis L (2000) Non-parametric model for background subtraction. Proc ECCV 1843:751–767. https://doi.org/10.1007/3-540-45053. http://www.springerlink.com/index/3mcvhnwfa8bj4ln5.pdf%5Cn. http://link.springer.com/chapter/10.1007/3-540-45053-X_48
Google Scholar
Friedman N, Russell S (1997) Image segmentation in video sequences: a probabilistic approach. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, pp 175–181. https://doi.org/10.1016/j.cviu.2007.08.003. arXiv:1302.1539
Goyette N, Jodoin PM, Porikli F, Konrad J, Ishwar P (2012) Changedetection.net: A new change detection benchmark dataset. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp 1–8. https://doi.org/10.1109/CVPRW.2012.6238919
Han B (2007) Real-time subspace-based background modeling using multi-channel data. Advances in Visual Computing pp. 162–172. https://doi.org/10.1007/978-3-540-76856-2_16. http://www.springerlink.com/index/R27567374R236621.pdf
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90. http://ieeexplore.ieee.org/document/7780459/
Jeeva S, Sivabalakrishnan M (2015) Survey on background modeling and foreground detection for real time video surveillance. In: Procedia Computer Science, vol 50, pp 566–571. https://doi.org/10.1016/j.procs.2015.04.085
Ji S, Yang M, Yu K, Xu W (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–31. https://doi.org/10.1109/TPAMI.2012.59. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=?6165309%5Cn. http://www.ncbi.nlm.nih.gov/pubmed/22392705
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093
Jiang S, Lu X (2017) WeSamBE: A weight-sample-based method for background subtraction. IEEE Trans Circ Syst Video Technol PP(99):1–1. https://doi.org/10.1109/TCSVT.2017.2711659. http://ieeexplore.ieee.org/document/7938679/
Google Scholar
Kaewtrakulpong P, Bowden R (2001) An improved adaptive background mixture model for real- time tracking with shadow detection. Advanced Video Based Surveillance Systems:1–5. http://personal.ee.surrey.ac.uk/Personal/R.Bowden/publications/avbs01/avbs01.pdf
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223
Kim W, Jung C (2017) Illumination-invariant background subtraction: Comparative review, models, and prospects. IEEE Access 5:8369–384
Article Google Scholar
Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(NIPS2012):1–9. https://doi.org/10.1109/5.726791
Google Scholar
Lan X, Ma AJ, Yuen PC (2014) Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1194–1201. https://doi.org/10.1109/CVPR.2014.156
Lan X, Ma AJ, Yuen PC, Chellappa R (2015) Joint sparse representation and robust feature-level fusion for multi-cue visual tracking. IEEE Trans Image Process 24(12):5826–5841. https://doi.org/10.1109/TIP.2015.2481325
Article MathSciNet Google Scholar
Lan X, Shengping Z, Yuen PC (2016) Robust joint discriminative feature learning for visual tracking. In: IJCAI International Joint Conference on Artificial Intelligence, pp 3403–3410
Lan X, Yuen PC, Chellappa R (2017) Robust mil-based feature template learning for object tracking. In: AAAI, pp 4118–4125
Liu R, Lin Z, Wei S, Su Z (2011) Solving principal component pursuit in linear time via l1 filtering. arXiv:1108.5359
Liu Z, Li X, Luo P, Loy CC, Tang X (2015) Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp 1377–1385. https://doi.org/10.1109/ICCV.2015.162
Liu R, Lan X, Yuen PC, C Feng G (2016) Robust visual tracking using dynamic feature weighting based on multiple dictionary learning. In: EUSIPCO, pp 2166–2170. https://doi.org/10.1109/EUSIPCO.2016.7760632
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7298965%5Cn. arXiv:1411.4038
Mittal A, Paragios N (2004) Motion-based background subtraction using adaptive kernel density estimation. Comput Vis Pattern Recogn 2:302–309. https://doi.org/10.1109/CVPR.2004.1315179. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=?1315179
Google Scholar
Oliver N, Rosario B, Pentland A (1999) A Bayesian computer vision system for modeling human interactions. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 1542, pp 255–272. https://doi.org/10.1007/3-540-49256-9_16
Pilet J, Strecha C, Fua P (2008) Making background subtraction robust to sudden illumination changes. https://doi.org/10.1007/978-3-540-88693-8-42
Pinheiro PHOP, Collobert R (2013) Recurrent convolutional neural networks for scene parsing. Proc 31st Int Conf Mach Learn 32(June):82–90. https://doi.org/10.1109/ICCV.2015.221. arXiv:1306.2795%5Cn. http://infoscience.epfl.ch/record/192577/files/Pinheiro_Idiap-RR-41-2013.pdf%5Cn. http://jmlr.org/proceedings/papers/v32/pinheiro14.html
Google Scholar
Results for cd.net 2014. http://wordpress-jodoin.dmi.usherb.ca/results2014/. Accessed: 2017-07-30
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28. arXiv:1505.04597%5Cn
Sajid H, Cheung SCS (2017) Universal multimode background subtraction. IEEE Trans Image Process 26(7):3249–3260. https://doi.org/10.1109/TIP.2017.2695882. http://ieeexplore.ieee.org/document/7904604/
Article MathSciNet Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. ICLR arXiv:1312.6229
Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICRL), pp 1–14. https://doi.org/10.1016/j.infsof.2008.09.005. arXiv:1409.1556
Sobral A, Vacavant A (2014) A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput Vis Image Underst 122:4–21. https://doi.org/10.1016/j.cviu.2013.12.005
Article Google Scholar
St-Charles PL, Bilodeau GA, Bergevin R (2015) A self-adjusting approach to change detection based on background word consensus. In: Proceedings - 2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015, pp 990–997. https://doi.org/10.1109/WACV.2015.137
St-Charles PL, Bilodeau GA, Bergevin R (2015) SuBSENSE: a universal change detection method with local adaptive sensitivity. IEEE Trans Image Process: Publ IEEE Signal Process Soc 24(1):359–73. https://doi.org/10.1109/TIP.2014.2378053. http://www.ncbi.nlm.nih.gov/pubmed/25494507
Article MathSciNet Google Scholar
Stefano LD, Tombari F, Mattoccia S (2007) Robust and accurate change detection under sudden illumination variations. ACCV’07 Workshop on Multi-dimensional and Multi-view Image Processing, Tokyo, Nov., 2007 MM-P-02, pp 103–109. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Robust+and+accurate+change+detection+under+sudden+illumination+variations#0
Torre FD, Black MJ (2003) A framework for robust subspace learning. Int J Comput Vis 54(1):117–142. https://doi.org/10.1023/A:1023709501986. http://www.springerlink.com/index/R8532J45384R3123.pdf
Article MATH Google Scholar
Tran D, Bourdev LD, Fergus R, Torresani L, Paluri M (2014) C3D: generic features for video analysis. CoRR arXiv:1412.0767
Tuzel O, Porikli F, Meer P (2005) A bayesian approach to background modeling. 2005 IEEE Comput Soc Conf Comput Vis Pattern Recogn (CVPR’05) - Workshops 3:58–58. https://doi.org/10.1109/CVPR.2005.384. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=?1565362
Article Google Scholar
Vosters L, Shan C, Gritti T (2012) Real-time robust background subtraction under rapidly changing illumination conditions. Image Vis Comput 30 (12):1004–1015. https://doi.org/10.1016/j.imavis.2012.08.017
Article Google Scholar
Wang D, Wang B, Zhao S, Yao H, Liu H (2017) View-based 3d object retrieval with discriminative views. Neurocomputing 252:58–66. https://doi.org/10.1016/j.neucom.2016.06.095
Article Google Scholar
Wang R, Bunyak F, Seetharaman G, Palaniappan K (2014) Static and moving object detection using flux tensor with split gaussian models. In: IEEE Computer Society Conference on Computer Visxion and Pattern Recognition Workshops, pp 420–424. https://doi.org/10.1109/CVPRW.2014.68
Wang Y, Jodoin PM, Porikli F, Konrad J, Benezeth Y, Ishwar P (2014) CDnet 2014: An expanded change detection benchmark dataset. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp 393–400. https://doi.org/10.1109/CVPRW.2014.126
Wang Y, Luo Z, Jodoin PM (2016) Interactive deep learning method for segmenting moving objects. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2016.09.014. http://www.sciencedirect.com/science/article/pii/S0167865516302471
Yao X, Han J, Cheng G, Qian X, Guo L (2016) Semantic annotation of high-resolution satellite images via weakly supervised learning. IEEE Trans Geosci Remote Sens 54(6):3660–3671. https://doi.org/10.1109/TGRS.2016.2523563
Article Google Scholar
Yao C, Liu YF, Jiang B, Han J, Han J (2017) LLE score: a new filter-based unsupervised feature selection method based on nonlinear manifold embedding and its application to image recognition. https://doi.org/10.1109/TIP.2017.2733200
Yao X, Han J, Zhang D, Nie F (2017) Revisiting co-saliency detection: A novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Trans Image Process 26(7):3196–3209. https://doi.org/10.1109/TIP.2017.2694222
Article MathSciNet Google Scholar
Zhang S, Yao H, Liu S (2008) Dynamic background modeling and subtraction using spatio-temporal local binary patterns. In: Proceedings - International Conference on Image Processing, ICIP, pp 1556–1559. https://doi.org/10.1109/ICIP.2008.4712065
Zhang S, Yao H, Liu S, Chen X, Gao W (2008) A covariance-based method for dynamic background subtraction. 2008 19th International Conference on Pattern Recognition, pp 4–7. https://doi.org/10.1109/ICPR.2008.4761162
Zhang S, Yao H, Liu S (2009) Dynamic background subtraction based on local dependency histogram. Int J Pattern Recogn Artif Intell 23 (07):1397. https://doi.org/10.1142/S0218001409007569. http://www.worldscinet.com/ijprai/23/2307/S0218001409007569.html
Article Google Scholar
Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3d depth data matching towards robust action retrieval. Neurocomputing 151:533–543. https://doi.org/10.1016/j.neucom.2014.03.092
Article Google Scholar
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118. https://doi.org/10.1016/j.sigpro.2014.09.038
Article Google Scholar
Zhao S, Yao H, Gao Y, Ji R, Xie W, Jiang X, Chua TS (2016) Predicting personalized emotion perceptions of social images. In: Proceedings of the 2016 ACM on Multimedia Conference, MM’16. ACM, New York, pp 1385–1394. https://doi.org/10.1145/2964284.2964289
Zhao S, Yao H, Gao Y, Ji R, Ding G (2017) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. Trans Multimed 19(3):632–645. https://doi.org/10.1109/TMM.2016.2617741
Article Google Scholar
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. Iccv, pp 1529–1537. https://doi.org/10.1109/ICCV.2015.179
Zhou T, Tao D (2011) GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case. ICML, p 8. https://doi.org/10.1109/TPAMI.2012.88. http://techtalks.tv/talks/54296/%5Cn. http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Zhou_41.pdf
Zhu Q, Shao L, Li Q, Xie Y (2013) Recursive kernel density estimation for modeling the background and segmenting moving objects. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1769–1772
Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, vol 2, pp 28–31. https://doi.org/10.1109/ICPR.2004.1333992. http://ieeexplore.ieee.org/document/1333992/
Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn Lett 27(7):773–780. https://doi.org/10.1016/j.patrec.2005.11.005
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, NE1 8ST, UK
Dimitrios Sakkos
School of Computer Science and of Technology, Anhui University of Technology, Anhui Sheng, 243032, China
Heng Liu
School of Computing and Communications, Lancaster University, Lancaster, LA1 4YW, UK
Jungong Han
School of Computer Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
Ling Shao

Authors

Dimitrios Sakkos
View author publications
You can also search for this author in PubMed Google Scholar
Heng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jungong Han
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jungong Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakkos, D., Liu, H., Han, J. et al. End-to-end video background subtraction with 3d convolutional neural networks. Multimed Tools Appl 77, 23023–23041 (2018). https://doi.org/10.1007/s11042-017-5460-9

Download citation

Received: 31 July 2017
Revised: 27 October 2017
Accepted: 26 November 2017
Published: 11 December 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-017-5460-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end video background subtraction with 3d convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

End-to-end video background subtraction with 3d convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation