ABSTRACT
Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.
- Aglobex. Population System Full Pack. Retrieved May 22, 2022 from https://www.unrealengine.com/marketplace/zh-CN/product/population-systemGoogle Scholar
- Sefd Avila, Apb Lopes, A. D. Luz, and Ada Araujo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1 January 2011), 56-68. https://doi.org/10.1016/j.patrec.2010.08.004Google ScholarDigital Library
- Carnegie Mellon University. CMU Graphics Lab Motion Capture Database. Retrieved May 22, 2022 from http://mocap.cs.cmu.edu/Google Scholar
- Xiaotang Chen, Kaiqi Huang, and Tieniu Tan. 2014. Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models. Pattern Recognition 47, 3 (March 2014), 1126-1137. http://doi.org/10.1016/j.patcog.2013.06.011Google ScholarDigital Library
- Yaosen Chen, Bing Guo, Yan Shen, Renshuang Zhou, Weichen Lu, Wei Wang, Xuming Wen, and Xinhua Suo. 2022. Video Summarization with U-Shaped Transformer. Applied Intelligence (April 2022), 1-17. https://doi.org/10.1007/s10489-022-03451-1Google ScholarDigital Library
- Yi-Nung Chung, Tun Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics 3, 1 (June 2015), 20-24. https://doi.org/10.18178/joig.3.1.20-24Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, USA, 248-255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarCross Ref
- Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, California, 1-16. https://doi.org/10.48550/arXiv.1711.03938Google Scholar
- Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2018. Summarizing Videos with Attention. In Proceedings of the Asian Conference on Computer Vision. Springer, Cham, Perth, Australia, 39-54. https://doi.org/10.1007/978-3-030-21074-8_4Google Scholar
- Xuming Feng, Yaping Zhu, and Cheng Yang. 2021. Video Summarization Based on Fusing Features and Shot Segmentation. In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE, Beijing, China, 383-387. https://doi.org/10.1109/IC-NIDC54101.2021.9660579Google ScholarCross Ref
- M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. 2014. Creating Summaries from User Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 505–520. https://doi.org/10.1007/978-3-319-10584-0_33Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 770-778. https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
- Tzu-Chun Hsu, Yi-Sheng Liao, and Chun-Rong Huang. 2021. Video Summarization with Frame Index Vision Transformer. In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE, Aichi, Japan, 1-5. https://doi.org/10.23919/MVA51890.2021.9511350Google ScholarCross Ref
- Cheng Huang and Hongmei Wang. 2019. A Novel Key-Frames Selection Framework for Comprehensive Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology 30, 2 (04 January 2019), 577-589. https://doi.org/10.1109/TCSVT.2019.2890899Google Scholar
- Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, and Junwei Han. 2018. User-Ranking Video Summarization with Multi-Stage Spatio–Temporal Representation. IEEE Transactions on Image Processing 28, 6 (21 December 2018), 2654-2664. https://doi.org/10.1109/TIP.2018.2889265Google Scholar
- H Hwang, C. Jang, G. Park, J. Cho, and I. J. Kim. 2021. ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications. IEEE Access 4 (14 January 2021), 1-16. https://doi.org/10.1109/ACCESS.2021.3051842Google Scholar
- iPi Soft LLC. iPi Mocap Studio. Retrieved May 22, 2022 from https://docs.ipisoft.com/iPi_Mocap_StudioGoogle Scholar
- Mohamed Maher Ben Ismail and Ouiem Bchir. 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics 3, 1 (June 2015), 34-39. https://doi.org/10.18178/joig.3.1.34-39Google Scholar
- Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video Summarization with Attention-Based Encoder–Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology 30, June 2020 (14 March 2019), 1709-1717. https://doi.org/10.1109/TCSVT.2019.2904996Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (June 2017), 84-90. https://doi.org/10.1145/3065386Google ScholarDigital Library
- Tianjiao Li, Jun Liu, Wei Zhang, Yun Ni, Wenqian Wang, and Zhiheng Li. 2021. UAV-Human A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 16266-16275. https://doi.org/10.1109/CVPR46437.2021.01600Google ScholarCross Ref
- Padmavathi Mundur, Yong Rao, and Yelena Yesha. 2006. Keyframe-Based Video Summarization using Delaunay Clustering. International Journal on Digital Libraries 6, 2 (April 2006), 219-232. https://doi.org/10.1007/s00799-005-0129-9Google ScholarDigital Library
- Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 7596-7604. https://doi.org/10.1109/CVPR.2019.00778Google ScholarCross Ref
- Costas Panagiotakis, Nelly Ovsepian, and Elena Michael. 2013. Video Synopsis Based on a Sequential Distortion Minimization Method. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, York, UK, 94-101. https://doi.org/10.1007/978-3-642-40261-6_11Google ScholarDigital Library
- Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-Specific Video Summarization. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 540–555. https://doi.org/10.1007/978-3-319-10599-4_35Google ScholarCross Ref
- Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, USA, 5179-5187. https://doi.org/10.1109/CVPR.2015.7299154Google Scholar
- Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6479-6488. https://doi.org/10.1109/CVPR.2018.00678Google ScholarCross Ref
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 1-9. https://doi.org/10.1109/CVPR.2015.7298594Google ScholarCross Ref
- Ryan Yeh and Alexander Loui. 2021. Synthesizing and Manipulating Natural Videos Using Image-to-Image Translation. In Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, Rochester, NY, USA, 1-5. https://doi.org/10.1109/WNYISPW53194.2021.9661282Google ScholarCross Ref
- Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, and Min Sun. 2016. Title Generation for User Generated Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Amsterdam, The Netherlands, 609-625. https://doi.org/10.1007/978-3-319-46475-6_38Google ScholarCross Ref
- Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-Scene Crowd Counting via Deep Convolutional Neural Ntworks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 833-841. https://doi.org/10.1109/CVPR.2015.7298684Google Scholar
- Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 11501-11510. https://doi.org/10.1109/CVPR46437.2021.01134Google ScholarCross Ref
- Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 589-597. https://doi.org/10.1109/CVPR.2016.70Google ScholarCross Ref
- Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2020. TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization. IEEE Transactions on Industrial Electronics 68, 4 (April 2021), 3629-3637. https://doi.org/10.1109/TIE.2020.2979573Google Scholar
- Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Transactions on Image Processing 30 (01 December 2020), 948-962. https://doi.org/10.1109/TIP.2020.3039886Google ScholarDigital Library
Index Terms
- VSSum: A Virtual Surveillance Dataset for Video Summary
Recommendations
A fault-tolerant ONVIF protocol extension for seamless surveillance video stream recording
ONVIF (Open Network Video Interface Forum) is an importance industrial standard for the video surveillance field. There are over 8000 ONVIF-compliant IP cameras (i.e., ONVIF NVT) and network video recorders (i.e., ONVIF NVS) from various vendors listed ...
Real-time video surveillance based on combining foreground extraction and human detection
MMM'08: Proceedings of the 14th international conference on Advances in multimedia modelingIn this paper, we present an adaptive foreground object extraction algorithm for real-time video surveillance, in conjunction with a human detection technique applied in the extracted foreground regions by using AdaBoost learning algorithm and ...
Real-time background subtraction based on GPGPU for high-resolution video surveillance
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationDemand for intelligent surveillance has been increasing, to automatically detect and prevent dangerous situations with surveillance cameras. Image analysis, the most essential element in intelligent surveillance system, has continuously developed and ...
Comments