skip to main content
research-article

Where Are They Going? Predicting Human Behaviors in Crowded Scenes

Published: 12 November 2021 Publication History

Abstract

In this article, we propose a framework for crowd behavior prediction in complicated scenarios. The fundamental framework is designed using the standard encoder-decoder scheme, which is built upon the long short-term memory module to capture the temporal evolution of crowd behaviors. To model interactions among humans and environments, we embed both the social and the physical attention mechanisms into the long short-term memory. The social attention component can model the interactions among different pedestrians, whereas the physical attention component helps to understand the spatial configurations of the scene. Since pedestrians’ behaviors demonstrate multi-modal properties, we use the generative model to produce multiple acceptable future paths. The proposed framework not only predicts an individual’s trajectory accurately but also forecasts the ongoing group behaviors by leveraging on the coherent filtering approach. Experiments are carried out on the standard crowd benchmarks (namely, the ETH, the UCY, the CUHK crowd, and the CrowdFlow datasets), which demonstrate that the proposed framework is effective in forecasting crowd behaviors in complex scenarios.

References

[1]
Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 961–971.
[2]
Alexandre Alahi, Vignesh Ramanathan, and Li Fei-Fei. 2014. Socially-aware large-scale crowd forecasting. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2203–2210.
[3]
Saad Ali and Mubarak Shah. 2007. A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–6.
[4]
Pierre Allain, Nicolas Courty, and Thomas Corpetti. 2009. Crowd flow characterization with optimal control theory. In Proceedings of the Asian Conference on Computer Vision. 279–290.
[5]
Timur Bagautdinov, Alexandre Alahi, Francois Fleuret, Pascal Fua, and Silvio Savarese. 2017. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 4315–4324.
[6]
Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. 2016. Knowledge transfer for scene-specific motion prediction. In Proceedings of the European Conference on Computer Vision. 697–713.
[7]
Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, and Alberto Del Bimbo. 2018. Context-aware trajectory prediction. In Proceedings of the IEEE International Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 1941–1946.
[8]
Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. 2018. Soft+hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection. Neural Networks 108 (2018), 466–478.
[9]
Weina Ge, Robert T. Collins, and R. Barry Ruback. 2012. Vision-based analysis of small groups in pedestrian crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 1003–1016.
[10]
Jason M. Grant and Patrick J. Flynn. 2017. Crowd scene understanding from video: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2 (2017), 19.
[11]
Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2255–2264.
[12]
Tal Hassner, Yossi Itcher, and Orit Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Los Alamitos, CA, 1–6.
[13]
Dirk Helbing and Peter Molnar. 1995. Social force model for pedestrian dynamics. Physical Review E 51, 5 (1995), 4282.
[14]
Kris M. Kitani, Brian D. Ziebart, J. Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In Proceedings of the European Conference on Computer Vision. 201–214.
[15]
Ven Jyn Kok, Mei Kuan Lim, and Chee Seng Chan. 2016. Crowd behavior analysis: A review where physics meets biology. Neurocomputing 177 (2016), 342–362.
[16]
Isah A. Lawal, Fabio Poiesi, Davide Anguita, and Andrea Cavallaro. 2016. Support vector motion clustering. IEEE Transactions on Circuits and Systems for Video Technology 27, 11 (2016), 2395–2408.
[17]
Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, Philip H. S. Torr, and Manmohan Chandraker. 2017. DESIRE: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 336–345.
[18]
Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, and Shuicheng Yan. 2014. Crowded scene analysis: A survey. IEEE Transactions on Circuits and Systems for Video Technology 25, 3 (2014), 367–386.
[19]
Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 1 (2013), 18–32.
[20]
Bruce D. Lucas and Takeo Kanade. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of International Joint Conference on Artificial Intelligence. 674–679.
[21]
Brendan Tran Morris and Mohan Manubhai Trivedi. 2011. Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 11 (2011), 2287–2301.
[22]
Alexandre Robicquet, Alexandre Alahi, Amir Sadeghian, Bryan Anenberg, John Doherty, Eli Wu, and Silvio Savarese. 2016. Forecasting social navigation in crowded complex scenes. arXiv:1601.00998.
[23]
Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2016. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the European Conference on Computer Vision. 549–565.
[24]
Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, and Kai O. Arras. 2020. Human motion trajectory prediction: A survey. International Journal of Robotics Research 39, 8 (2020), 895–935.
[25]
Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, and Silvio Savarese. 2018. Sophie: An attentive GAN for predicting paths compliant to social and physical constraints. arXiv:1806.01482.
[26]
Gregory Schröder, Tobias Senst, Erik Bochinski, and Thomas Sikora. 2018. Optical flow dataset and benchmark for visual crowd analysis. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, Los Alamitos, CA, 1–6.
[27]
Jing Shao, Chen Change Loy, and Xiaogang Wang. 2016. Learning scene-independent group descriptors for crowd understanding. IEEE Transactions on Circuits and Systems for Video Technology 27, 6 (2016), 1290–1303.
[28]
Jianbo Shi and Carlo Tomasi. 1994. Good features to track. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 593–600.
[29]
Berkan Solmaz, Brian E. Moore, and Mubarak Shah. 2012. Identifying behaviors in crowd scenes using stability analysis for dynamical systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 10 (2012), 2064–2070.
[30]
Jur Van den Berg, Ming Lin, and Dinesh Manocha. 2008. Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 1928–1935.
[31]
Anirudh Vemula, Katharina Muelling, and Jean Oh. 2018. Social attention: Modeling attention in human crowds. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 1–7.
[32]
He Wang, Jan Ondřej, and Carol O’Sullivan. 2016. Path patterns: Analyzing and comparing real and simulated crowds. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. ACM, New York, NY, 49–57.
[33]
Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2015. Understanding pedestrian behaviors from stationary crowd groups. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 3488–3496.
[34]
Shuai Yi, Hongsheng Li, and Xiaogang Wang. 2016. Pedestrian behavior understanding and prediction with deep neural networks. In Proceedings of the European Conference on Computer Vision. 263–279.
[35]
Jinghui Zhong, Wentong Cai, Linbo Luo, and Haiyan Yin. 2015. Learning behavior patterns from video: A data-driven framework for agent-based crowd modeling. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 801–809.
[36]
Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. 2012. Coherent filtering: Detecting coherent motions from crowd clutters. In Proceedings of the European Conference on Computer Vision. 857–871.
[37]
Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. 2015. Learning collective crowd behaviors with dynamic pedestrian-agents. International Journal of Computer Vision 111, 1 (2015), 50–68.

Cited By

View all
  • (2025)DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human InsertionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3715140Online publication date: 27-Jan-2025
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
  • (2024)Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network modelPLOS ONE10.1371/journal.pone.029995019:3(e0299950)Online publication date: 28-Mar-2024
  • Show More Cited By

Index Terms

  1. Where Are They Going? Predicting Human Behaviors in Crowded Scenes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
    November 2021
    529 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3492437
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2021
    Accepted: 01 February 2021
    Revised: 01 November 2020
    Received: 01 February 2020
    Published in TOMM Volume 17, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Crowd analysis
    2. behavior prediction
    3. attention mechanism
    4. multi-modality modeling
    5. pedestrian grouping

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • China Postdoctoral Science Foundation
    • National Natural Science Foundation of China
    • Liaoning Collaborative Fund

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)70
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human InsertionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3715140Online publication date: 27-Jan-2025
    • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
    • (2024)Early warning on safety risk of highly aggregated tourist crowds based on VGGT-Count network modelPLOS ONE10.1371/journal.pone.029995019:3(e0299950)Online publication date: 28-Mar-2024
    • (2024)Tidal Crowds: A Federated Crowd Flow Prediction AlgorithmProceedings of the 2024 7th International Conference on Geoinformatics and Data Analysis10.1145/3678599.3678609(37-44)Online publication date: 19-Apr-2024
    • (2024)Discriminative Segment Focus Network for Fine-grained Video Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365467120:7(1-20)Online publication date: 15-May-2024
    • (2024)TFSemantic: A Time–Frequency Semantic GAN Framework for Imbalanced Classification Using Radio SignalsACM Transactions on Sensor Networks10.1145/361409620:4(1-22)Online publication date: 11-May-2024
    • (2024)Crowd Descriptors and Interpretable Gathering UnderstandingIEEE Transactions on Multimedia10.1109/TMM.2024.338104026(8651-8664)Online publication date: 2024
    • (2024)Multilevel Joint Association Networks for Diverse Human Motion PredictionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.33868408:6(4165-4178)Online publication date: Dec-2024
    • (2024)SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value StoresIEEE Transactions on Computers10.1109/TC.2023.332698273:1(206-220)Online publication date: 1-Jan-2024
    • (2024)Recent trends in crowd management using deep learning techniques: a systematic literature reviewJournal of Umm Al-Qura University for Engineering and Architecture10.1007/s43995-024-00071-3Online publication date: 20-Jun-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media