skip to main content
10.1145/3561613.3561631acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccvConference Proceedingsconference-collections
research-article

VSSum: A Virtual Surveillance Dataset for Video Summary

Authors Info & Claims
Published:09 November 2022Publication History

ABSTRACT

Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.

References

  1. Aglobex. Population System Full Pack. Retrieved May 22, 2022 from https://www.unrealengine.com/marketplace/zh-CN/product/population-systemGoogle ScholarGoogle Scholar
  2. Sefd Avila, Apb Lopes, A. D. Luz, and Ada Araujo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern recognition letters 32, 1 (1 January 2011), 56-68. https://doi.org/10.1016/j.patrec.2010.08.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Carnegie Mellon University. CMU Graphics Lab Motion Capture Database. Retrieved May 22, 2022 from http://mocap.cs.cmu.edu/Google ScholarGoogle Scholar
  4. Xiaotang Chen, Kaiqi Huang, and Tieniu Tan. 2014. Object Tracking across Non-Overlapping Views by Learning Inter-Camera Transfer Models. Pattern Recognition 47, 3 (March 2014), 1126-1137. http://doi.org/10.1016/j.patcog.2013.06.011Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yaosen Chen, Bing Guo, Yan Shen, Renshuang Zhou, Weichen Lu, Wei Wang, Xuming Wen, and Xinhua Suo. 2022. Video Summarization with U-Shaped Transformer. Applied Intelligence (April 2022), 1-17. https://doi.org/10.1007/s10489-022-03451-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yi-Nung Chung, Tun Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics 3, 1 (June 2015), 20-24. https://doi.org/10.18178/joig.3.1.20-24Google ScholarGoogle ScholarCross RefCross Ref
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, USA, 248-255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, California, 1-16. https://doi.org/10.48550/arXiv.1711.03938Google ScholarGoogle Scholar
  9. Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2018. Summarizing Videos with Attention. In Proceedings of the Asian Conference on Computer Vision. Springer, Cham, Perth, Australia, 39-54. https://doi.org/10.1007/978-3-030-21074-8_4Google ScholarGoogle Scholar
  10. Xuming Feng, Yaping Zhu, and Cheng Yang. 2021. Video Summarization Based on Fusing Features and Shot Segmentation. In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). IEEE, Beijing, China, 383-387. https://doi.org/10.1109/IC-NIDC54101.2021.9660579Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool. 2014. Creating Summaries from User Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 505–520. https://doi.org/10.1007/978-3-319-10584-0_33Google ScholarGoogle Scholar
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 770-778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  13. Tzu-Chun Hsu, Yi-Sheng Liao, and Chun-Rong Huang. 2021. Video Summarization with Frame Index Vision Transformer. In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA). IEEE, Aichi, Japan, 1-5. https://doi.org/10.23919/MVA51890.2021.9511350Google ScholarGoogle ScholarCross RefCross Ref
  14. Cheng Huang and Hongmei Wang. 2019. A Novel Key-Frames Selection Framework for Comprehensive Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology 30, 2 (04 January 2019), 577-589. https://doi.org/10.1109/TCSVT.2019.2890899Google ScholarGoogle Scholar
  15. Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, and Junwei Han. 2018. User-Ranking Video Summarization with Multi-Stage Spatio–Temporal Representation. IEEE Transactions on Image Processing 28, 6 (21 December 2018), 2654-2664. https://doi.org/10.1109/TIP.2018.2889265Google ScholarGoogle Scholar
  16. H Hwang, C. Jang, G. Park, J. Cho, and I. J. Kim. 2021. ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications. IEEE Access 4 (14 January 2021), 1-16. https://doi.org/10.1109/ACCESS.2021.3051842Google ScholarGoogle Scholar
  17. iPi Soft LLC. iPi Mocap Studio. Retrieved May 22, 2022 from https://docs.ipisoft.com/iPi_Mocap_StudioGoogle ScholarGoogle Scholar
  18. Mohamed Maher Ben Ismail and Ouiem Bchir. 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics 3, 1 (June 2015), 34-39. https://doi.org/10.18178/joig.3.1.34-39Google ScholarGoogle Scholar
  19. Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video Summarization with Attention-Based Encoder–Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology 30, June 2020 (14 March 2019), 1709-1717. https://doi.org/10.1109/TCSVT.2019.2904996Google ScholarGoogle Scholar
  20. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (June 2017), 84-90. https://doi.org/10.1145/3065386Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tianjiao Li, Jun Liu, Wei Zhang, Yun Ni, Wenqian Wang, and Zhiheng Li. 2021. UAV-Human A Large Benchmark for Human Behavior Understanding With Unmanned Aerial Vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 16266-16275. https://doi.org/10.1109/CVPR46437.2021.01600Google ScholarGoogle ScholarCross RefCross Ref
  22. Padmavathi Mundur, Yong Rao, and Yelena Yesha. 2006. Keyframe-Based Video Summarization using Delaunay Clustering. International Journal on Digital Libraries 6, 2 (April 2006), 219-232. https://doi.org/10.1007/s00799-005-0129-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 7596-7604. https://doi.org/10.1109/CVPR.2019.00778Google ScholarGoogle ScholarCross RefCross Ref
  24. Costas Panagiotakis, Nelly Ovsepian, and Elena Michael. 2013. Video Synopsis Based on a Sequential Distortion Minimization Method. In Proceedings of the International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, York, UK, 94-101. https://doi.org/10.1007/978-3-642-40261-6_11Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-Specific Video Summarization. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Zurich, Switzerland, 540–555. https://doi.org/10.1007/978-3-319-10599-4_35Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, USA, 5179-5187. https://doi.org/10.1109/CVPR.2015.7299154Google ScholarGoogle Scholar
  27. Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 6479-6488. https://doi.org/10.1109/CVPR.2018.00678Google ScholarGoogle ScholarCross RefCross Ref
  28. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 1-9. https://doi.org/10.1109/CVPR.2015.7298594Google ScholarGoogle ScholarCross RefCross Ref
  29. Ryan Yeh and Alexander Loui. 2021. Synthesizing and Manipulating Natural Videos Using Image-to-Image Translation. In Proceedings of the 2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW). IEEE, Rochester, NY, USA, 1-5. https://doi.org/10.1109/WNYISPW53194.2021.9661282Google ScholarGoogle ScholarCross RefCross Ref
  30. Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, and Min Sun. 2016. Title Generation for User Generated Videos. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Amsterdam, The Netherlands, 609-625. https://doi.org/10.1007/978-3-319-46475-6_38Google ScholarGoogle ScholarCross RefCross Ref
  31. Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-Scene Crowd Counting via Deep Convolutional Neural Ntworks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 833-841. https://doi.org/10.1109/CVPR.2015.7298684Google ScholarGoogle Scholar
  32. Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, and Qi Tian. 2021. UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 11501-11510. https://doi.org/10.1109/CVPR46437.2021.01134Google ScholarGoogle ScholarCross RefCross Ref
  33. Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 589-597. https://doi.org/10.1109/CVPR.2016.70Google ScholarGoogle ScholarCross RefCross Ref
  34. Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2020. TTH-RNN: Tensor-Train Hierarchical Recurrent Neural Network for Video Summarization. IEEE Transactions on Industrial Electronics 68, 4 (April 2021), 3629-3637. https://doi.org/10.1109/TIE.2020.2979573Google ScholarGoogle Scholar
  35. Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Transactions on Image Processing 30 (01 December 2020), 948-962. https://doi.org/10.1109/TIP.2020.3039886Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VSSum: A Virtual Surveillance Dataset for Video Summary

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICCCV '22: Proceedings of the 5th International Conference on Control and Computer Vision
        August 2022
        241 pages
        ISBN:9781450397315
        DOI:10.1145/3561613

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 November 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)56
        • Downloads (Last 6 weeks)6

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format