skip to main content
10.1145/3561613.3561631acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccvConference Proceedingsconference-collections
research-article

VSSum: A Virtual Surveillance Dataset for Video Summary

Published: 09 November 2022 Publication History

Abstract

Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.

References

[1]
Aglobex. Population System Full Pack. Retrieved May 22, 2022 from https://www.unrealengine.com/marketplace/zh-CN/product/population-system
[2]
Sefd Avila, Apb Lopes, A. D. Luz, and Ada Araujo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1 January 2011), 56-68. https://doi.org/10.1016/j.patrec.2010.08.004
[3]
Carnegie Mellon University. CMU Graphics Lab Motion Capture Database. Retrieved May 22, 2022 from http://mocap.cs.cmu.edu/
[4]
Xiaotang Chen, Kaiqi Huang, and Tieniu Tan. 2014. Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models. Pattern Recognition 47, 3 (March 2014), 1126-1137. http://doi.org/10.1016/j.patcog.2013.06.011
[5]
Yaosen Chen, Bing Guo, Yan Shen, Renshuang Zhou, Weichen Lu, Wei Wang, Xuming Wen, and Xinhua Suo. 2022. Video Summarization with U-Shaped Transformer. Applied Intelligence (April 2022), 1-17. https://doi.org/10.1007/s10489-022-03451-1
[6]
Yi-Nung Chung, Tun Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics 3, 1 (June 2015), 20-24. https://doi.org/10.18178/joig.3.1.20-24
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, USA, 248-255. https://doi.org/10.1109/CVPR.2009.5206848
[8]
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, California, 1-16. https://doi.org/10.48550/arXiv.1711.03938
[9]
Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2018. Summarizing Videos with Attention. In Proceedings of the Asian Conference on Computer Vision. Springer, Cham, Perth, Australia, 39-54. https://doi.org/10.1007/978-3-030-21074-8_4
[10]
Xuming Feng, Yaping Zhu, and Cheng Yang. 2021. Video Summarization Based on Fusing Features and Shot Segmentation. In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE, Beijing, China, 383-387. https://doi.org/10.1109/IC-NIDC54101.2021.9660579
[11]
M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. 2014. Creating Summaries from User Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 505–520. https://doi.org/10.1007/978-3-319-10584-0_33
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 770-778. https://doi.org/10.1109/CVPR.2016.90
[13]
Tzu-Chun Hsu, Yi-Sheng Liao, and Chun-Rong Huang. 2021. Video Summarization with Frame Index Vision Transformer. In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE, Aichi, Japan, 1-5. https://doi.org/10.23919/MVA51890.2021.9511350
[14]
Cheng Huang and Hongmei Wang. 2019. A Novel Key-Frames Selection Framework for Comprehensive Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology 30, 2 (04 January 2019), 577-589. https://doi.org/10.1109/TCSVT.2019.2890899
[15]
Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, and Junwei Han. 2018. User-Ranking Video Summarization with Multi-Stage Spatio–Temporal Representation. IEEE Transactions on Image Processing 28, 6 (21 December 2018), 2654-2664. https://doi.org/10.1109/TIP.2018.2889265
[16]
H Hwang, C. Jang, G. Park, J. Cho, and I. J. Kim. 2021. ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications. IEEE Access 4 (14 January 2021), 1-16. https://doi.org/10.1109/ACCESS.2021.3051842
[17]
iPi Soft LLC. iPi Mocap Studio. Retrieved May 22, 2022 from https://docs.ipisoft.com/iPi_Mocap_Studio
[18]
Mohamed Maher Ben Ismail and Ouiem Bchir. 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics 3, 1 (June 2015), 34-39. https://doi.org/10.18178/joig.3.1.34-39
[19]
Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video Summarization with Attention-Based Encoder–Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology 30, June 2020 (14 March 2019), 1709-1717. https://doi.org/10.1109/TCSVT.2019.2904996
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (June 2017), 84-90. https://doi.org/10.1145/3065386
[21]
Tianjiao Li, Jun Liu, Wei Zhang, Yun Ni, Wenqian Wang, and Zhiheng Li. 2021. UAV-Human A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 16266-16275. https://doi.org/10.1109/CVPR46437.2021.01600
[22]
Padmavathi Mundur, Yong Rao, and Yelena Yesha. 2006. Keyframe-Based Video Summarization using Delaunay Clustering. International Journal on Digital Libraries 6, 2 (April 2006), 219-232. https://doi.org/10.1007/s00799-005-0129-9
[23]
Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 7596-7604. https://doi.org/10.1109/CVPR.2019.00778
[24]
Costas Panagiotakis, Nelly Ovsepian, and Elena Michael. 2013. Video Synopsis Based on a Sequential Distortion Minimization Method. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, York, UK, 94-101. https://doi.org/10.1007/978-3-642-40261-6_11
[25]
Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-Specific Video Summarization. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 540–555. https://doi.org/10.1007/978-3-319-10599-4_35
[26]
Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, USA, 5179-5187. https://doi.org/10.1109/CVPR.2015.7299154
[27]
Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6479-6488. https://doi.org/10.1109/CVPR.2018.00678
[28]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 1-9. https://doi.org/10.1109/CVPR.2015.7298594
[29]
Ryan Yeh and Alexander Loui. 2021. Synthesizing and Manipulating Natural Videos Using Image-to-Image Translation. In Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, Rochester, NY, USA, 1-5. https://doi.org/10.1109/WNYISPW53194.2021.9661282
[30]
Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, and Min Sun. 2016. Title Generation for User Generated Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Amsterdam, The Netherlands, 609-625. https://doi.org/10.1007/978-3-319-46475-6_38
[31]
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-Scene Crowd Counting via Deep Convolutional Neural Ntworks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 833-841. https://doi.org/10.1109/CVPR.2015.7298684
[32]
Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 11501-11510. https://doi.org/10.1109/CVPR46437.2021.01134
[33]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 589-597. https://doi.org/10.1109/CVPR.2016.70
[34]
Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2020. TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization. IEEE Transactions on Industrial Electronics 68, 4 (April 2021), 3629-3637. https://doi.org/10.1109/TIE.2020.2979573
[35]
Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Transactions on Image Processing 30 (01 December 2020), 948-962. https://doi.org/10.1109/TIP.2020.3039886

Cited By

View all
  • (2024)Personalized Video Summarization: A Comprehensive Survey of Methods and DatasetsApplied Sciences10.3390/app1411440014:11(4400)Online publication date: 22-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCCV '22: Proceedings of the 5th International Conference on Control and Computer Vision
August 2022
241 pages
ISBN:9781450397315
DOI:10.1145/3561613
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DNN
  2. dataset
  3. surveillance
  4. video summary
  5. virtual

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Fundamental Research Funds for the Central University, China

Conference

ICCCV 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Personalized Video Summarization: A Comprehensive Survey of Methods and DatasetsApplied Sciences10.3390/app1411440014:11(4400)Online publication date: 22-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media