ABSTRACT
There has recently been an explosion of interest in creating large-scale shared virtual spaces for multiplayer content. However, rendering player-controllable avatars in real-time creates latency issues when scaling to thousands of players. We introduce a human audience video dataset to support applications in deep learning-based 2D video audience simulation, bypassing the need for background 3D virtual humans. This dataset consists of YouTube videos that depict audiences with diverse lighting conditions, color, dress, and movement patterns. We describe the dataset statistics, our implicit data collection strategy, and audience video extraction pipeline. We apply deep learning tasks on this data based on video prediction techniques, and propose a novel method for 2D audience simulations.
- Lluis Castrejon, Nicolas Ballas, and Aaron Courville. 2019. Improved Conditional VRNNs for Video Prediction. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Funda Durupınar, Uğur Güdükbay, Aytek Aman, and Norman I. Badler. 2016. Psychological Parameters for Crowd Simulation: From Audiences to Mobs. IEEE Transactions on Visualization and Computer Graphics 22, 9 (2016), 2145–2159. https://doi.org/10.1109/TVCG.2015.2501801Google ScholarDigital Library
- David F. Fouhey, Weicheng Kuo, Alexei A. Efros, and Jitendra Malik. 2017. From Lifestyle Vlogs to Everyday Interactions. CoRR abs/1712.02310 (2017). arXiv:1712.02310http://arxiv.org/abs/1712.02310Google Scholar
- Jason M Grant and Patrick J Flynn. 2017. Crowd scene understanding from video: a survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 13, 2 (2017), 1–23.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google Scholar
- Morgan McGuire. 2022. Personal communication.Google Scholar
- Pongpisit Thanasutives, Ken-ichi Fukui, Masayuki Numao, and Boonserm Kijsirikul. 2021. Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2382–2389.Google Scholar
- Sahar Waqar, Usman Ghani Khan, M Hamza Waseem, and Samyan Qayyum. 2022. The utility of datasets in crowd modelling and analysis: a survey. Multimedia Tools and Applications (2022), 1–32.Google ScholarDigital Library
Index Terms
- Towards Learning and Generating Audience Motion from Video
Recommendations
Interactive stories and the audience: Why empathy is important
SPECIAL ISSUE: TV and Video Entertainment EnvironmentsInteractive narratives have long been advocated as having the potential to create more immersive and transformative experiences for audiences by adding the pleasure of agency. In practice, however, finding the balance between sufficient interactivity ...
Audience Silhouettes: Peripheral Awareness of Synchronous Audience Kinesics for Social Television
TVX '15: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online VideoWe introduce audience silhouettes for TV, which are visual representations of viewers' body movements displayed in real-time on top of television content. With their minimal visual cues and their ability to convey presence and to leverage interactions ...
Audience empathy: a phenomenological method for mediated performance
C&C '11: Proceedings of the 8th ACM conference on Creativity and cognitionThis research investigates audience experience of empathy with a performer during a digitally mediated performance. Theatrical performance necessitates social interaction between performers and audience. We present a performance-based study that ...
Comments