Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Martinez-Gil, Francisco; Lozano, Miguel; Fernández, Fernando

doi:10.1007/s10458-014-9252-6

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Published: 20 February 2014

Volume 29, pages 98–130, (2015)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Francisco Martinez-Gil¹,
Miguel Lozano¹ &
Fernando Fernández²

1097 Accesses
19 Citations
Explore all metrics

Abstract

In this paper, a new multi-agent reinforcement learning approach is introduced for the simulation of pedestrian groups. Unlike other solutions, where the behaviors of the pedestrians are coded in the system, in our approach the agents learn by interacting with the environment. The embodied agents must learn to control their velocity, avoiding obstacles and the other pedestrians, to reach a goal inside the scenario. The main contribution of this paper is to propose this new methodology that uses different iterative learning strategies, combining a vector quantization (state space generalization) with the Q-learning algorithm (VQQL). Two algorithmic schemas, Iterative VQQL and Incremental, which differ in the way of addressing the problems, have been designed and used with and without transfer of knowledge. These algorithms are tested and compared with the VQQL algorithm as a baseline in two scenarios where agents need to solve well-known problems in pedestrian modeling. In the first, agents in a closed room need to reach the unique exit producing and solving a bottleneck. In in the second, two groups of agents inside a corridor need to reach their goal that is placed in opposite sides (they need to solve the crossing). In the first scenario, we focus on scalability, use metrics from the pedestrian modeling field, and compare with the Helbing’s social force model. The emergence of collective behaviors, that is, the shell-shaped clogging in front of the exit in the first scenario, and the lane formation as a solution to the problem of the crossing, have been obtained and analyzed. The results demonstrate that the proposed schemas find policies that carry out the tasks, suggesting that they are applicable and generalizable to the simulation of pedestrians groups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

From One to Many: Simulating Groups of Agents with Reinforcement Learning Controllers

A Hybrid Reinforcement Learning and Cellular Automata Model for Crowd Simulation on the GPU

Notes

The term ‘trial’ situated in the abscissa of the graphics has the same meaning that the term ‘episode’ in the text.
In machine learning, many different approaches are used to fill in unobserved features. We have studied informally some of them, specifically random imputation and mean imputation, obtaining similar performances.
In the experiments, we will show that 18 iterations is a value large enough to ensure convergence in all the proposed scenarios.
Assuming that a soft variation in the values of the parameters produce a soft variation in the learning performance (the experiments agree with this assumption), the way of finding the values for the learning parameters consists of a coarse search inside the allowed values followed by a refinement over the candidate with better performance.
Specifically, the policy \(\pi _0\) choose randomly from the set of actions that turns the agent’s velocity vector towards the right side of the corridor
In order to fit the size of the table, we have abbreviated the names of the schemas in all the tables. Thus, IT means ITVQQL and the prefix TF means “with transfer of knowledge”.

References

Agre, P. & Chapman, D. (1987). Pengi: An implementation of a theory of activity. In: Proceedings of the Sixth National Conference on Artificial Intelligence, (pp. 268–272). Burlington: Morgan Kaufmann
Banerjee, B., Abukmail, A., & Kraemer, L. (2009). Layered intelligence for agent-based crowd simulation. Simulation, 85, 621–632.
Article Google Scholar
van den Berg, J., Lin, M. & Manocha, D. (2008). Reciprocal velocity obstales for real-time multi-agent navigator. In: Proceedings of the IEEE International Conference on Robotics and Automation (pp. 1928–1935).
Bierlaire, M., & Robin, T. (2009). Pedestrians choices. In H. Timmermans (Ed.), Pedestrian Behavior (pp. 1–26). Bradford: Emerald.
Google Scholar
Bosse, T., Hoogendoorn, M., Klein, M. C. A., Treur, J., van der Wal, C. N., & van Wissen, A. (2013). Modelling collective decision making in groups and crowds: Integrating social contagion and interacting emotions, beliefs and intentions. Autonomous Agents and Multi-Agent Systems, 27(1), 52–84.
Article Google Scholar
Campanella, M., Hoogendoorn, S., Daamen, W. (2010). Calibrating walker models: A methodology and applications. In: Proceedings of the 12th World Conference on Transport Research WCTR 2010. Lisbon: 12th WCTR Comitee.
Claus, C. & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746–752). Menlo Park: AAAI Press.
Daamen, W. & Hoogendoorn, S. (2003). Experimental research of pedestrian walking behavior. In: Transportation Research Board Annual Meeting 2003, (pp. 1–16). Washington: National Academy Press.
Fernández, F., & Borrajo, D. (2008). Two steps reinforcement learning. International Journal of Intelligent Systems, 23(2), 213–245.
Article MATH Google Scholar
Fernández, F., Borrajo, D., & Parker, L. (2005). A reinforcement learning algorithm in cooperative multi-robot domains. Journal of Intelligent Robotics Systems, 43(2–4), 161–174.
Article Google Scholar
Fernández, F., García, J., & Veloso, M. (2010). Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems, 58(7), 866–871.
Google Scholar
Fernando Fernández, J. G., & Veloso, M. (2010). Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems. Special Issue on Advances in Autonomous Robots for Service and Entertainment, 58(7), 866–871.
Google Scholar
Fruin, J. (1971). Pedestrian and planning design. Tech. rep., Metropolitan Association of Urban Designers and Environmental Planners. New York, Library of congress catalogue number 70–159312.
García, J., López-Bueno, I., Fernández, F. & Borrajo, D. (2010). A Comparative Study of Discretization Approaches for State Space Generalization in the Keepaway Soccer Task. In: Reinforcement Learning: Algorithms, Implementations and Aplications. Hauppauge: Nova Science Publishers.
Gipps, P., & Marsjo, B. (1985). A microsimulation model for pedestrian flows. Mathematics and Computers in Simulation, 27, 95–105.
Article Google Scholar
Gray, R. M. (1984). Vector quantization. IEEE ASSP Magazine, 1(2), 4–29.
Article Google Scholar
Helbing, D. (2004). Collective phenomena and states in traffic and self-driven many-particle systems. Computational Materials Science, 30, 180–187.
Article Google Scholar
Helbing, D., Buzna, L., Johansson, A., & Werner, T. (2005). Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions. Transportation Science, 39(1), 1–24.
Article Google Scholar
Helbing, D., Farkas, I., & Vicsek, T. (2000). Simulating dynamical features of escape panic. Nature, 407, 487.
Article Google Scholar
Helbing, D. & Johansson, A. (2009). Pedestrian, Crowd and Evacuation Dynamics. Encyclopedia of Complexity and Systems Science, Part 16. (pp. 6476–6495). New York: Springer. .
Helbing, D., Johansson, A., & Al-Abideen, H. Z. (2007). Dynamics of crowd disasters: An empirical study. Physical Review E, 75, 046109.
Article Google Scholar
Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics. Physics Review E, 51, 4282–4286.
Article Google Scholar
Helbing, D., Molnár, P., Farkas, I., & Bolay, K. (2001). Self-organizing pedestrian movement. Environment and Planning. Part B. Planning and Design, 28, 361–383.
Article Google Scholar
Javier García, F. B., & Fernández, F. (2012). Reinforcement learning for decision-making in a business simulator. International Journal of Information Technology & Decision Making, 11(5), 935–960.
Article Google Scholar
Karamouzas, I., & Overmars, M. (2012). Simulating and evaluating the local behavior of small pedestrian groups. IEEE Transactions on Visualization and Computer Graphics, 18, 394–406.
Article Google Scholar
Klein, F., Bourjot, C. & Chevrier, V. (2009). Application of reinforcement learning to control a multiagent system. In: International Conference on Agents and Artificial Intelligence. Berlin: Springer.
Lane, T., Ridens, M., Stevens, S. (2007). Reinforcement learning in nonstationary environment navigation tasks. In: Advances in Artificial Intelligence (LNCS 4509), pp. 429–440. Berlin: Springer.
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.
Article Google Scholar
Littman, M.L. (2005). Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163). New Brunswick: Morgan Kaufmann.
Lovas, G. (1994). Modelling and simulation of pedestrian traffic flow. Transportation Research, 28B, 429–443.
Article Google Scholar
Martinez-Gil, F., Barber, F., Lozano, M., Grimaldo, F., Fernández, F. (2010). A reinforcement learning approach for multiagent navigation. In: ICAART 2010—Proceedings of the International Conferencenon Agents and Artificial Intelligence, Volume 1 (pp. 607–610). Artificial Intelligence: Valencia, January 22–24, 2010.
Martinez-Gil, F., Lozano, M. & Fernández, F. (2012). Calibrating a motion model based on reinforcement learning for pedestrian simulation. In: Motion in Games - 5th International Conference, MIG 2012, Rennes, France, November 15–17, 2012. Proceedings, Lecture Notes in Computer Science, vol. 7660, pp. 302–313. Springer.
Martinez-Gil, F., Lozano, M. & Fernández, F. (2012). Multi-agent reinforcement learning for simulating pedestrian navigation. In: Adaptive and Learning Agents - International Workshop, ALA 2011, Held at AAMAS 2011, Taipei, Taiwan, May 2, 2011, Revised Selected Papers, Lecture Notes in Computer Science, vol. 7113, pp. 54–69. Springer.
Mataric, M. J. (1994). Learning to behave socially. In: From Animals to Animats: International Conference on Simulation of Adaptive Behavior (pp. 453–462). Cambridge: MIT Press.
Pelechano, N., Allbeck, J. & Badler, N. (2007). Controlling individual agents in high-density crowd simulation. In: Proc. ACM/SIGGRAPH/Eurographycs Symp. Computer Animation, pp. 99–108.
Pettré, J., Ondrej, J., Olivier, A., Crétual, A., Donikian, S. (2009). Experiment-based modeling.simulation and validation of interactions between virtual walkers. In: Proceedings of the Symposium on Computer, Animation SCA’09 (pp. 189–198).
Reynolds, C. (2003). Evolution of corridor following behavior in a noisy world. In: From animals to animats. Proceedings of the third international conference on simulation of adaptive behavior. Cambridge: MIT Press.
Rindsfüser, G., & Klügl, F. (2007). Agent-based pedestrian simulation: A case study of the Bern railway station. disP, 3, 9–18.
Google Scholar
Robin, T., Antonioni, G., Bierlaire, M., & Cruz, J. (2009). Specification, estimation and validation of a pedestrian walking behavior model. Transportation Research, 43, 36–56.
Article Google Scholar
Schadschneider, A., Klingsch, W., Kluepfel, H., Kretz, T., Rogsch, C., & Seyfried, A. (2008). Evacuation dynamics: empirical results, modelling and applications. In R. A. Meyers (Ed.), Encyclopedia of Complexity and Systems Science (pp. 3142–3176). Heidelberg: Springer.
Google Scholar
Schadschneider, A., & Syfried, A. (2011). Empirical results for pedestrian dynamics and their implications for modeling. Networks and Heterogeneous Media, 6, 545–560.
Article MATH MathSciNet Google Scholar
Sen, S. & Sekaran, M. (1996). Multiagent coordination with learning classifier systems. In: IJCAI95 Workshop on Adaptation and Learning in Multiagent Systems (pp. 218–233). Berlin: Springer.
Seyfried, A., Steffen, B., Klingsch, W. & Boltes, M. (2005). The fundamental diagram of pedestrian movement revisited. Journal of Statistical Mechanics: Theory and Experiment, p. P10002.
Shao, W., & Terzopoulos, D. (2005). Autonomous pedestrians. In: Proceedings of the 2005 ACM SIGGRAPH symposium on Computer animation. New York: ACM Press.
Book Google Scholar
Steiner, A., Philipp, M. & Schmid, A. (2007). Parameter stimation for pedestrian simulation model. In: Proc. 7th Swiss Transport Research Conference (pp. 1–29).
Still, K. (2000). Crowd dynamics. Ph.D. thesis, Department of Mathematics. Warwick University, UK.
Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press.
Google Scholar
Sakuma, T., & Mukai, S. K. (2005). Psychological model for animating crowded pedestrians: virtual humans and social agents. Computer animation virtual worlds, 16, 343–351.
Article Google Scholar
Taylor, M. & Stone, P. (2007). Representation transfer in reinforcement learning. In: AAAI 2007 Fall Symposium on Computational Approacher to Representation Change during Learning and Development.
Taylor, M., & Stone, P. (2009). Transfer learning for reinforcement learning domains: a survey. Journal of Machine Learning Research, 10, 1633–1685.
MATH MathSciNet Google Scholar
Taylor, M.E., Suay, H.B. & Chernova, S. (2011). Integrating reinforcement learning with human demonstrations of varying ability. In: Proceedings International Conference on Autonomous Agents and Multiagent Systems.
Teknomo, K. (2002). Microscopic pedestrian flow characteristics: Development of an image processing data collection and simulation model. Ph.D. thesis, Department of Human Social Information Sciencies. Tohoku University, Japan.
Thesauro, G. & Kephart, J. (2002). Pricing in agent economies using multi-agent q-learning. In: International Conference on Autonomous Agents and Multiagents Systems (AAMAS’02).
Torrey, L. (2010). Crowd simulation via multi-agent reinforcement learning. In: Proceedings of the Sixth AAAI Conference On Artificial Intelligence and Interactive Digital Entertainment. Menlo Park: AAAI Press.
Torrey, L. & Taylor, M.E. (2012). Help an agent out: Student/teacher learning in sequential decision tasks. In: Proceedings of the Adaptive and Learning Agents workshop (at AAMAS-12).
Vigueras, G., Lozano, M., Orduña, J. M., & Grimaldo, F. (2010). A comparative study of partitioning methods for crowd simulations. Applied Soft Computing, 10(1), 225–235.
Article Google Scholar
Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
MATH Google Scholar
Weidmann, U. (1993). Transporttechnik der fussgänger - transporttechnische eigenschaften des fussgngerverkehrs (literaturstudie). Literature Research 90, Institut füer Verkehrsplanung, Transporttechnik, Strassen- undEisenbahnbau IVT an der ETH Zürich, ETH-Hönggerberg, CH-8093 Zürich.
Whitehead, S.D. & Ballard, D.H. (1991). Learning to perceive and act by trial and error. Machine Learning pp. 45–83.

Download references

Acknowledgments

This work has been jointly supported by the Spanish MICINN and European Commission FEDER funds under grants Consolider-Ingenio CSD2006-00046 and TIN2009-14475-C04-04. Fernando Fernández is supported by Spanish MINECO under Grant TIN2012-38079-C03-02 and TRA2009-0080.

Author information

Authors and Affiliations

Departament d’Informàtica, Escola Tècnica Superior d’Enginyeria (ETSE), Universitat de València, Avda. de la Universidad s/n, 46100 , Burjassot, Valencia, Spain
Francisco Martinez-Gil & Miguel Lozano
Department of Computer Science, University Carlos III of Madrid, Avda. de la Universidad 30, 28911 , Leganés, Madrid, Spain
Fernando Fernández

Authors

Francisco Martinez-Gil
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Lozano
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Fernández
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco Martinez-Gil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martinez-Gil, F., Lozano, M. & Fernández, F. Strategies for simulating pedestrian navigation with multiple reinforcement learning agents. Auton Agent Multi-Agent Syst 29, 98–130 (2015). https://doi.org/10.1007/s10458-014-9252-6

Download citation

Published: 20 February 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10458-014-9252-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Abstract

Access this article

Similar content being viewed by others

Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

From One to Many: Simulating Groups of Agents with Reinforcement Learning Controllers

A Hybrid Reinforcement Learning and Cellular Automata Model for Crowd Simulation on the GPU

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

Abstract

Access this article

Similar content being viewed by others

Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

From One to Many: Simulating Groups of Agents with Reinforcement Learning Controllers

A Hybrid Reinforcement Learning and Cellular Automata Model for Crowd Simulation on the GPU

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation