Abstract
Moving object detection under video surveillance systems is a critical task for many computer vision applications. That being said, extracting object from real-world surveillance video is still a challenging task, since various appearances and shapes, and light condition are all unsolved problems. Exploiting contexts from adjacent frames is believed to be valuable for tackling these challenges in surveillance video and is far from development. In this paper, we introduce a novel one-stage approach, named SDNN, to detect objects with multiple successive frames in videos. Specifically, the network fuses the context information of multiple frames, and combines predictions from multi-scale feature maps at different layers. The multi-frame feature fusion scheme enable the training process follows an end-to-end fashion. Experimental results conducted on surveillance video dataset show that the proposed SDNN achieved state-of-the-art results. The source code is available at https://github.com/jmuyjl/SDNN.
The first author is a student. This work is supported by the National Natural Science Foundation of China under Grant No. 61702251, the Key Technical Project of Fujian Province under Grant No. 2017H6015.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Felzenszwalb, P.F., et al.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Castrilln, M., et al.: A comparison of face and facial feature detectors based on the Viola–Jones general object detection framework. Mach. Vis. Appl. 22(3), 481–494 (2011)
Pang, Y., et al.: Efficient HOG human detection. Signal Process. 91(4), 773–781 (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Papandreou, G., et al.: Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., et al.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2009)
Uijlings, J.R.R., et al.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Lin, T.-Y., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lin, T.-Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Liu, X., et al.: DFF, a heterodimeric protein that functions downstream of caspase-3 to trigger DNA fragmentation during apoptosis. Cell 89(2), 175–184 (1997)
Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Torralba, A., et al.: Context-based vision system for place and object recognition (2003)
Tu, Z.: Auto-context and its application to high-level vision tasks. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, J. et al. (2019). Semi-supervised Deep Neural Networks for Object Detection in Video Surveillance Systems. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11857. Springer, Cham. https://doi.org/10.1007/978-3-030-31654-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-31654-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31653-2
Online ISBN: 978-3-030-31654-9
eBook Packages: Computer ScienceComputer Science (R0)