ABSTRACT
In this demo, we present a real-time surveillance video parsing (RSVP) system to parse surveillance videos. Surveillance video parsing, which aims to segment the video frames into several labels, e.g., face, pants, left-legs, has wide applications, especially in security filed. However, it is very tedious and time-consuming to annotate all the frames in a video. We design a RSVP system to parse the surveillance videos in real-time. The RSVP system requires only one labeled frame in training stage. The RSVP system jointly considers the segmentation of preceding frames when parsing one particular frame within the video. The RSVP system is proved to be effective and efficient in real applications.
Supplemental Material
- Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. CVPR (2016).Google Scholar
- Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip H"ausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. CoRR (2015).Google Scholar
- Si Li, Tianzhu Zhang, Changsheng Xu, and Xiaochun Cao. 2016. Structural Correlation Filter for Robust Visual Tracking. CVPR (2016).Google Scholar
- Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao, and Yao Sun. 2017. Surveillance Video Parsing with Single Frame Supervision. CVPR (2017).Google Scholar
- Jérôme Revaud, Philippe Weinzaepfel, Zaíd Harchaoui, and Cordelia Schmid. 2015. EpicFlow: Edge-preserving interpolation of correspondences for optical flow. CVPR (2015).Google Scholar
Index Terms
- RSVP: A Real-Time Surveillance Video Parsing System with Single Frame Supervision
Recommendations
Human parsing by weak structural label
Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are ...
Multi-Human Parsing Machines
MM '18: Proceedings of the 26th ACM international conference on MultimediaHuman parsing is an important task in human-centric analysis. Despite the remarkable progress in single-human parsing, the more realistic case of multi-human parsing remains challenging in terms of the data and the model. Compared with the considerable ...
Comments