research-article

Where Are They Going? Predicting Human Behaviors in Crowded Scenes

Authors:

Niccolo Bisagno,

Francesco G. B. De Natale,

Hongbo LiuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 4

Article No.: 123, Pages 1 - 19

https://doi.org/10.1145/3449359

Published: 12 November 2021 Publication History

Abstract

In this article, we propose a framework for crowd behavior prediction in complicated scenarios. The fundamental framework is designed using the standard encoder-decoder scheme, which is built upon the long short-term memory module to capture the temporal evolution of crowd behaviors. To model interactions among humans and environments, we embed both the social and the physical attention mechanisms into the long short-term memory. The social attention component can model the interactions among different pedestrians, whereas the physical attention component helps to understand the spatial configurations of the scene. Since pedestrians’ behaviors demonstrate multi-modal properties, we use the generative model to produce multiple acceptable future paths. The proposed framework not only predicts an individual’s trajectory accurately but also forecasts the ongoing group behaviors by leveraging on the coherent filtering approach. Experiments are carried out on the standard crowd benchmarks (namely, the ETH, the UCY, the CUHK crowd, and the CrowdFlow datasets), which demonstrate that the proposed framework is effective in forecasting crowd behaviors in complex scenarios.

References

[1]

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 961–971.

[2]

Alexandre Alahi, Vignesh Ramanathan, and Li Fei-Fei. 2014. Socially-aware large-scale crowd forecasting. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2203–2210.

Digital Library

[3]

Saad Ali and Mubarak Shah. 2007. A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–6.

[4]

Pierre Allain, Nicolas Courty, and Thomas Corpetti. 2009. Crowd flow characterization with optimal control theory. In Proceedings of the Asian Conference on Computer Vision. 279–290.

Digital Library

[5]

Timur Bagautdinov, Alexandre Alahi, Francois Fleuret, Pascal Fua, and Silvio Savarese. 2017. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 4315–4324.

[6]

Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. 2016. Knowledge transfer for scene-specific motion prediction. In Proceedings of the European Conference on Computer Vision. 697–713.

[7]

Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, and Alberto Del Bimbo. 2018. Context-aware trajectory prediction. In Proceedings of the IEEE International Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 1941–1946.

[8]

Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2018. Soft+hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection. Neural Networks 108 (2018), 466–478.

[9]

Weina Ge, Robert T. Collins, and R. Barry Ruback. 2012. Vision-based analysis of small groups in pedestrian crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 1003–1016.

Digital Library

[10]

Jason M. Grant and Patrick J. Flynn. 2017. Crowd scene understanding from video: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2 (2017), 19.

Digital Library

[11]

Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2255–2264.

[12]

Tal Hassner, Yossi Itcher, and Orit Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Los Alamitos, CA, 1–6.

[13]

Dirk Helbing and Peter Molnar. 1995. Social force model for pedestrian dynamics. Physical Review E 51, 5 (1995), 4282.

[14]

Kris M. Kitani, Brian D. Ziebart, J. Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In Proceedings of the European Conference on Computer Vision. 201–214.

Digital Library

[15]

Ven Jyn Kok, Mei Kuan Lim, and Chee Seng Chan. 2016. Crowd behavior analysis: A review where physics meets biology. Neurocomputing 177 (2016), 342–362.

Digital Library

[16]

Isah A. Lawal, Fabio Poiesi, Davide Anguita, and Andrea Cavallaro. 2016. Support vector motion clustering. IEEE Transactions on Circuits and Systems for Video Technology 27, 11 (2016), 2395–2408.

[17]

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, Philip H. S. Torr, and Manmohan Chandraker. 2017. DESIRE: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 336–345.

[18]

Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, and Shuicheng Yan. 2014. Crowded scene analysis: A survey. IEEE Transactions on Circuits and Systems for Video Technology 25, 3 (2014), 367–386.

Digital Library

[19]

Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 1 (2013), 18–32.

Digital Library

[20]

Bruce D. Lucas and Takeo Kanade. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of International Joint Conference on Artificial Intelligence. 674–679.

Digital Library

[21]

Brendan Tran Morris and Mohan Manubhai Trivedi. 2011. Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 11 (2011), 2287–2301.

Digital Library

[22]

Alexandre Robicquet, Alexandre Alahi, Amir Sadeghian, Bryan Anenberg, John Doherty, Eli Wu, and Silvio Savarese. 2016. Forecasting social navigation in crowded complex scenes. arXiv:1601.00998.

[23]

Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2016. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the European Conference on Computer Vision. 549–565.

[24]

Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, and Kai O. Arras. 2020. Human motion trajectory prediction: A survey. International Journal of Robotics Research 39, 8 (2020), 895–935.

Digital Library

[25]

Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, and Silvio Savarese. 2018. Sophie: An attentive GAN for predicting paths compliant to social and physical constraints. arXiv:1806.01482.

[26]

Gregory Schröder, Tobias Senst, Erik Bochinski, and Thomas Sikora. 2018. Optical flow dataset and benchmark for visual crowd analysis. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, Los Alamitos, CA, 1–6.

[27]

Jing Shao, Chen Change Loy, and Xiaogang Wang. 2016. Learning scene-independent group descriptors for crowd understanding. IEEE Transactions on Circuits and Systems for Video Technology 27, 6 (2016), 1290–1303.

Digital Library

[28]

Jianbo Shi and Carlo Tomasi. 1994. Good features to track. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 593–600.

[29]

Berkan Solmaz, Brian E. Moore, and Mubarak Shah. 2012. Identifying behaviors in crowd scenes using stability analysis for dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 10 (2012), 2064–2070.

Digital Library

[30]

Jur Van den Berg, Ming Lin, and Dinesh Manocha. 2008. Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 1928–1935.

[31]

Anirudh Vemula, Katharina Muelling, and Jean Oh. 2018. Social attention: Modeling attention in human crowds. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 1–7.

[32]

He Wang, Jan Ondřej, and Carol O’Sullivan. 2016. Path patterns: Analyzing and comparing real and simulated crowds. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. ACM, New York, NY, 49–57.

Digital Library

[33]

Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015. Understanding pedestrian behaviors from stationary crowd groups. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 3488–3496.

[34]

Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2016. Pedestrian behavior understanding and prediction with deep neural networks. In Proceedings of the European Conference on Computer Vision. 263–279.

[35]

Jinghui Zhong, Wentong Cai, Linbo Luo, and Haiyan Yin. 2015. Learning behavior patterns from video: A data-driven framework for agent-based crowd modeling. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 801–809.

Digital Library

[36]

Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. 2012. Coherent filtering: Detecting coherent motions from crowd clutters. In Proceedings of the European Conference on Computer Vision. 857–871.

Digital Library

[37]

Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. 2015. Learning collective crowd behaviors with dynamic pedestrian-agents. International Journal of Computer Vision 111, 1 (2015), 50–68.

Digital Library

Cited By

Cao XZhou WSun QWang WLi LLi H(2025)DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human InsertionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3715140Online publication date: 27-Jan-2025
https://doi.org/10.1145/3715140
Liu YRen YLiu MLi HGuo HMiao XHu XChen HMa XWon Y(2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650702
Liu JWu GLiu Y(2024)Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network modelPLOS ONE10.1371/journal.pone.029995019:3(e0299950)Online publication date: 28-Mar-2024
https://doi.org/10.1371/journal.pone.0299950
Show More Cited By

Index Terms

Where Are They Going? Predicting Human Behaviors in Crowded Scenes
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Visual analysis of socio-cognitive crowd behaviors for surveillance: A survey and categorization of trends and methods
Abstract
Monitoring and inferring socio-cognitive behaviors through crowd analysis can help us to understand many processes. Be it people in crowded environments, road traffic or even a flock of fish, situational awareness becomes critical for ...
Towards understanding socio-cognitive behaviors of crowds from visual surveillance data
Abstract
The problem of understanding socio-cognitive aspects of crowd behavior is a challenging yet critical task particularly for human-computer interaction applications. This issue is considered an important component of both current surveillance ...
Observed behaviours in simulated close-range pedestrian dynamics
SIMAUD '18: Proceedings of the Symposium on Simulation for Architecture and Urban Design

Crowd simulation can be a useful tool for predicting, analyzing, and planning mass-gathering events. The analysis of simulated crowds aims to extract observations to assess occupant interactions and potential crowd flow issues. This paper presents a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 4

November 2021

529 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3492437

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2021

Accepted: 01 February 2021

Revised: 01 November 2020

Received: 01 February 2020

Published in TOMM Volume 17, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
China Postdoctoral Science Foundation
National Natural Science Foundation of China
Liaoning Collaborative Fund

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
470
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)7

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao XZhou WSun QWang WLi LLi H(2025)DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human InsertionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3715140Online publication date: 27-Jan-2025
https://doi.org/10.1145/3715140
Liu YRen YLiu MLi HGuo HMiao XHu XChen HMa XWon Y(2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650702
Liu JWu GLiu Y(2024)Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network modelPLOS ONE10.1371/journal.pone.029995019:3(e0299950)Online publication date: 28-Mar-2024
https://doi.org/10.1371/journal.pone.0299950
Geçer MGarbinato B(2024)Tidal Crowds: A Federated Crowd Flow Prediction AlgorithmProceedings of the 2024 7th International Conference on Geoinformatics and Data Analysis10.1145/3678599.3678609(37-44)Online publication date: 19-Apr-2024
https://dl.acm.org/doi/10.1145/3678599.3678609
Sun BYe XYan TWang ZLi HWang Z(2024)Discriminative Segment Focus Network for Fine-grained Video Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467120:7(1-20)Online publication date: 15-May-2024
https://doi.org/10.1145/3654671
Liao PWang XAn LMao SZhao TYang C(2024)TFSemantic: A Time–Frequency Semantic GAN Framework for Imbalanced Classification Using Radio SignalsACM Transactions on Sensor Networks10.1145/361409620:4(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3614096
Zhou YLiu CDing YYuan DYin JYang S(2024)Crowd Descriptors and Interpretable Gathering UnderstandingIEEE Transactions on Multimedia10.1109/TMM.2024.338104026(8651-8664)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3381040
Chen LFan WGui XHou YYang XZhang QWei XZhou D(2024)Multilevel Joint Association Networks for Diverse Human Motion PredictionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.33868408:6(4165-4178)Online publication date: Dec-2024
https://doi.org/10.1109/TETCI.2024.3386840
Cai MJiang XShen JYe B(2024)SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value StoresIEEE Transactions on Computers10.1109/TC.2023.332698273:1(206-220)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TC.2023.3326982
Alasmari AFarooqi NAlotaibi Y(2024)Recent trends in crowd management using deep learning techniques: a systematic literature reviewJournal of Umm Al-Qura University for Engineering and Architecture10.1007/s43995-024-00071-3Online publication date: 20-Jun-2024
https://doi.org/10.1007/s43995-024-00071-3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents