research-article

VSSum: A Virtual Surveillance Dataset for Video Summary

Authors:

Fang RenAuthors Info & Claims

ICCCV '22: Proceedings of the 5th International Conference on Control and Computer Vision

Pages 113 - 119

https://doi.org/10.1145/3561613.3561631

Published: 09 November 2022 Publication History

Abstract

Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.

References

[1]

Aglobex. Population System Full Pack. Retrieved May 22, 2022 from https://www.unrealengine.com/marketplace/zh-CN/product/population-system

[2]

Sefd Avila, Apb Lopes, A. D. Luz, and Ada Araujo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1 January 2011), 56-68. https://doi.org/10.1016/j.patrec.2010.08.004

Digital Library

[3]

Carnegie Mellon University. CMU Graphics Lab Motion Capture Database. Retrieved May 22, 2022 from http://mocap.cs.cmu.edu/

[4]

Xiaotang Chen, Kaiqi Huang, and Tieniu Tan. 2014. Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models. Pattern Recognition 47, 3 (March 2014), 1126-1137. http://doi.org/10.1016/j.patcog.2013.06.011

Digital Library

[5]

Yaosen Chen, Bing Guo, Yan Shen, Renshuang Zhou, Weichen Lu, Wei Wang, Xuming Wen, and Xinhua Suo. 2022. Video Summarization with U-Shaped Transformer. Applied Intelligence (April 2022), 1-17. https://doi.org/10.1007/s10489-022-03451-1

Digital Library

[6]

Yi-Nung Chung, Tun Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics 3, 1 (June 2015), 20-24. https://doi.org/10.18178/joig.3.1.20-24

[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, USA, 248-255. https://doi.org/10.1109/CVPR.2009.5206848

[8]

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, California, 1-16. https://doi.org/10.48550/arXiv.1711.03938

[9]

Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2018. Summarizing Videos with Attention. In Proceedings of the Asian Conference on Computer Vision. Springer, Cham, Perth, Australia, 39-54. https://doi.org/10.1007/978-3-030-21074-8_4

[10]

Xuming Feng, Yaping Zhu, and Cheng Yang. 2021. Video Summarization Based on Fusing Features and Shot Segmentation. In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE, Beijing, China, 383-387. https://doi.org/10.1109/IC-NIDC54101.2021.9660579

[11]

M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. 2014. Creating Summaries from User Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 505–520. https://doi.org/10.1007/978-3-319-10584-0_33

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 770-778. https://doi.org/10.1109/CVPR.2016.90

[13]

Tzu-Chun Hsu, Yi-Sheng Liao, and Chun-Rong Huang. 2021. Video Summarization with Frame Index Vision Transformer. In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE, Aichi, Japan, 1-5. https://doi.org/10.23919/MVA51890.2021.9511350

[14]

Cheng Huang and Hongmei Wang. 2019. A Novel Key-Frames Selection Framework for Comprehensive Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology 30, 2 (04 January 2019), 577-589. https://doi.org/10.1109/TCSVT.2019.2890899

[15]

Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, and Junwei Han. 2018. User-Ranking Video Summarization with Multi-Stage Spatio–Temporal Representation. IEEE Transactions on Image Processing 28, 6 (21 December 2018), 2654-2664. https://doi.org/10.1109/TIP.2018.2889265

[16]

H Hwang, C. Jang, G. Park, J. Cho, and I. J. Kim. 2021. ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications. IEEE Access 4 (14 January 2021), 1-16. https://doi.org/10.1109/ACCESS.2021.3051842

[17]

iPi Soft LLC. iPi Mocap Studio. Retrieved May 22, 2022 from https://docs.ipisoft.com/iPi_Mocap_Studio

[18]

Mohamed Maher Ben Ismail and Ouiem Bchir. 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics 3, 1 (June 2015), 34-39. https://doi.org/10.18178/joig.3.1.34-39

[19]

Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video Summarization with Attention-Based Encoder–Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology 30, June 2020 (14 March 2019), 1709-1717. https://doi.org/10.1109/TCSVT.2019.2904996

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (June 2017), 84-90. https://doi.org/10.1145/3065386

Digital Library

[21]

Tianjiao Li, Jun Liu, Wei Zhang, Yun Ni, Wenqian Wang, and Zhiheng Li. 2021. UAV-Human A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 16266-16275. https://doi.org/10.1109/CVPR46437.2021.01600

[22]

Padmavathi Mundur, Yong Rao, and Yelena Yesha. 2006. Keyframe-Based Video Summarization using Delaunay Clustering. International Journal on Digital Libraries 6, 2 (April 2006), 219-232. https://doi.org/10.1007/s00799-005-0129-9

Digital Library

[23]

Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 7596-7604. https://doi.org/10.1109/CVPR.2019.00778

[24]

Costas Panagiotakis, Nelly Ovsepian, and Elena Michael. 2013. Video Synopsis Based on a Sequential Distortion Minimization Method. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, York, UK, 94-101. https://doi.org/10.1007/978-3-642-40261-6_11

Digital Library

[25]

Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-Specific Video Summarization. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 540–555. https://doi.org/10.1007/978-3-319-10599-4_35

[26]

Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, USA, 5179-5187. https://doi.org/10.1109/CVPR.2015.7299154

[27]

Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6479-6488. https://doi.org/10.1109/CVPR.2018.00678

[28]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 1-9. https://doi.org/10.1109/CVPR.2015.7298594

[29]

Ryan Yeh and Alexander Loui. 2021. Synthesizing and Manipulating Natural Videos Using Image-to-Image Translation. In Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, Rochester, NY, USA, 1-5. https://doi.org/10.1109/WNYISPW53194.2021.9661282

[30]

Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, and Min Sun. 2016. Title Generation for User Generated Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Amsterdam, The Netherlands, 609-625. https://doi.org/10.1007/978-3-319-46475-6_38

[31]

Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-Scene Crowd Counting via Deep Convolutional Neural Ntworks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 833-841. https://doi.org/10.1109/CVPR.2015.7298684

[32]

Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 11501-11510. https://doi.org/10.1109/CVPR46437.2021.01134

[33]

Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 589-597. https://doi.org/10.1109/CVPR.2016.70

[34]

Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2020. TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization. IEEE Transactions on Industrial Electronics 68, 4 (April 2021), 3629-3637. https://doi.org/10.1109/TIE.2020.2979573

[35]

Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Transactions on Image Processing 30 (01 December 2020), 948-962. https://doi.org/10.1109/TIP.2020.3039886

Digital Library

Cited By

Peronikolis MPanagiotakis C(2024)Personalized Video Summarization: A Comprehensive Survey of Methods and DatasetsApplied Sciences10.3390/app1411440014:11(4400)Online publication date: 22-May-2024
https://doi.org/10.3390/app14114400

Index Terms

VSSum: A Virtual Surveillance Dataset for Video Summary
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
      2. Image and video acquisition

Recommendations

A fault-tolerant ONVIF protocol extension for seamless surveillance video stream recording

ONVIF (Open Network Video Interface Forum) is an importance industrial standard for the video surveillance field. There are over 8000 ONVIF-compliant IP cameras (i.e., ONVIF NVT) and network video recorders (i.e., ONVIF NVS) from various vendors listed ...
Real-time video surveillance based on combining foreground extraction and human detection
MMM'08: Proceedings of the 14th international conference on Advances in multimedia modeling

In this paper, we present an adaptive foreground object extraction algorithm for real-time video surveillance, in conjunction with a human detection technique applied in the extracted foreground regions by using AdaBoost learning algorithm and ...
Real-time background subtraction based on GPGPU for high-resolution video surveillance
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

Demand for intelligent surveillance has been increasing, to automatically detect and prevent dangerous situations with surveillance cameras. Image analysis, the most essential element in intelligent surveillance system, has continuously developed and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCCV '22: Proceedings of the 5th International Conference on Control and Computer Vision

August 2022

241 pages

ISBN:9781450397315

DOI:10.1145/3561613

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Fundamental Research Funds for the Central University, China

Conference

ICCCV 2022

ICCCV 2022: 2022 The 5th International Conference on Control and Computer Vision

August 19 - 21, 2022

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
97
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)3

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Peronikolis MPanagiotakis C(2024)Personalized Video Summarization: A Comprehensive Survey of Methods and DatasetsApplied Sciences10.3390/app1411440014:11(4400)Online publication date: 22-May-2024
https://doi.org/10.3390/app14114400

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten