demonstration

RSVP: A Real-Time Surveillance Video Parsing System with Single Frame Supervision

Authors:
Han Yu

SKLOIS, IIE, CAS, Beijing, China

SKLOIS, IIE, CAS, Beijing, China
View Profile

,
Guanghui Ren

SKLOIS, IIE, CAS, Beijing, China

SKLOIS, IIE, CAS, Beijing, China
View Profile

,
Ruihe Qian

SKLOIS, IIE, CAS, Beijing, China

SKLOIS, IIE, CAS, Beijing, China
View Profile

,
Yao Sun

SKLOIS, IIE, CAS, Beijing, China

SKLOIS, IIE, CAS, Beijing, China
View Profile

,
Changhu Wang

Toutiao AI Lab, Beijing, China

Toutiao AI Lab, Beijing, China
View Profile

,
Hanqing Lu

IA, CAS, Beijing, China

IA, CAS, Beijing, China
View Profile

,
Si Liu

SKLOIS, IIE, CAS, Beijing, China

SKLOIS, IIE, CAS, Beijing, China
View Profile

MM '17: Proceedings of the 25th ACM international conference on MultimediaOctober 2017Pages 1257–1258https://doi.org/10.1145/3123266.3127928

Published:19 October 2017Publication History

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 1257–1258

ABSTRACT

In this demo, we present a real-time surveillance video parsing (RSVP) system to parse surveillance videos. Surveillance video parsing, which aims to segment the video frames into several labels, e.g., face, pants, left-legs, has wide applications, especially in security filed. However, it is very tedious and time-consuming to annotate all the frames in a video. We design a RSVP system to parse the surveillance videos in real-time. The RSVP system requires only one labeled frame in training stage. The RSVP system jointly considers the segmentation of preceding frames when parsing one particular frame within the video. The RSVP system is proved to be effective and efficient in real applications.

Supplemental Material

demo54.mp4

mp4

10.6 MB

Download

References

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. CVPR (2016).Google Scholar
Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip H"ausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. CoRR (2015).Google Scholar
Si Li, Tianzhu Zhang, Changsheng Xu, and Xiaochun Cao. 2016. Structural Correlation Filter for Robust Visual Tracking. CVPR (2016).Google Scholar
Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao, and Yao Sun. 2017. Surveillance Video Parsing with Single Frame Supervision. CVPR (2017).Google Scholar
Jérôme Revaud, Philippe Weinzaepfel, Zaíd Harchaoui, and Cordelia Schmid. 2015. EpicFlow: Edge-preserving interpolation of correspondences for optical flow. CVPR (2015).Google Scholar

Index Terms

RSVP: A Real-Time Surveillance Video Parsing System with Single Frame Supervision
1. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Parsing

Recommendations

Human parsing by weak structural label

Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are ...
Read More
Multi-Human Parsing Machines
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Human parsing is an important task in human-centric analysis. Despite the remarkable progress in single-human parsing, the more realistic case of multi-human parsing remains challenging in terms of the data and the model. Compared with the considerable ...
Read More
RFC2996: Format of the RSVP DCLASS Object
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China
Copyright © 2017 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2017
Check for updates
Author Tags
deep learning
human parsing
surveillance video parsing system
Qualifiers
- demonstration
Conference

Acceptance Rates
MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 122
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RSVP: A Real-Time Surveillance Video Parsing System with Single Frame Supervision

MM '17: Proceedings of the 25th ACM international conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Human parsing by weak structural label

Multi-Human Parsing Machines

RFC2996: Format of the RSVP DCLASS Object