MARL-Ped: A multi-agent reinforcement learning based framework to simulate pedestrian groups

https://doi.org/10.1016/j.simpat.2014.06.005Get rights and content

Abstract

Pedestrian simulation is complex because there are different levels of behavior modeling. At the lowest level, local interactions between agents occur; at the middle level, strategic and tactical behaviors appear like overtakings or route choices; and at the highest level path-planning is necessary. The agent-based pedestrian simulators either focus on a specific level (mainly in the lower one) or define strategies like the layered architectures to independently manage the different behavioral levels. In our Multi-Agent Reinforcement-Learning-based Pedestrian simulation framework (MARL-Ped) the situation is addressed as a whole. Each embodied agent uses a model-free Reinforcement Learning (RL) algorithm to learn autonomously to navigate in the virtual environment. The main goal of this work is to demonstrate empirically that MARL-Ped generates learned behaviors adapted to the level required by the pedestrian scenario. Three different experiments, described in the pedestrian modeling literature, are presented to test our approach: (i) election of the shortest path vs. quickest path; (ii) a crossing between two groups of pedestrians walking in opposite directions inside a narrow corridor; (iii) two agents that move in opposite directions inside a maze. The results show that MARL-Ped solves the different problems, learning individual behaviors with characteristics of pedestrians (local control that produces adequate fundamental diagrams, route-choice capability, emergence of collective behaviors and path-planning). Besides, we compared our model with that of Helbing’s social forces, a well-known model of pedestrians, showing similarities between the pedestrian dynamics generated by both approaches. These results demonstrate empirically that MARL-Ped generates variate plausible behaviors, producing human-like macroscopic pedestrian flow.

Introduction

In the current state of the art there are several pedestrian simulation approaches that focuses on steering the individuals (microscopic simulation) to generate both individual and group pedestrian behaviors. Microscopic pedestrians models consider the individual interactions and try to model the position and velocity of each pedestrian over the time. Between the most representative microscopic seminal models of pedestrians we have the cellular automata models [1], behavioral rule-based models [2], cognitive models [3], Helbing’s social forces model [4] and psychological models [5]. In a microscopic simulator, the individuals are simulated as independent entities that interact with the others and with the environment, taking decisions to modify its dynamic state (including the calculation of the sum of a set of forces as a kind of decision). The decision-making process in the microscopic simulators follows a hierarchical scheme [6]: strategical, tactical and operational. The destinations and path planning are chosen at the strategical level, the route choice is performed at the tactical level and the instantaneous decisions to modify the kinematic state are taken at the operational level. Several microscopic simulators that focuses on the reproduction of the local interactions work only at the operational level [7].

A common problem in the microscopic models is the relationship between the individual behaviors and the group behavior. Traditionally, rule-based systems [8], [4] are the most popular in this area to simulate local interactions. However, due to the complexity of the multi-agent collision avoidance, it is difficult to generate a lifelike group motion that follows the local rules [9]. Most agent-based models separate the local interactions from the necessary global path planning. To do this, there are two main approaches. One is to pre-compute or user-edit a path-planning map that is represented as a guidance field [9] or as a potential and velocity field [10]. Other consists on separating the local and global navigation problems in a layered model [11]. To make that split inside the agent model has the advantage that intelligent or psychological properties to the agents behaviors can be introduced [5], [12]. One indicator that this relationship is correctly resolved is that certain collective patterns appear when groups of pedestrians are under specific situations, as happens in the real world. Several collective behaviors have been described to appear in specific group situations such as lane formations in corridors [13], faster-is-slower effect [14] and arch-like cloggings at bottlenecks [15], [13]. Social forces and its variants [4], agent-based models [16] and animal-based approaches [17], are microscopic models that have being successful in emerging collective pedestrian behaviors using different approaches. In pedestrian modeling, the capability to reproduce these phenomena, collective behavior or self-organization phenomena, is an indicator of the quality of the model.

In this work, a multi-agent RL-based framework for pedestrians simulation (MARL-Ped) is evaluated in three different scenarios that are described in Section 5. Each scenario faces a different simulation problem. This framework constitutes a different approach to the existent microscopic simulators, that uses learning techniques to create an individual controller for the navigation of each simulated pedestrian. The MARL-Ped framework offers the following benefits:

  • 1.

    Behavior building instead of behavior modeling. The user does not have to specify guidance rules or other models to define the pedestrian’s behavior. Only high level restrictions over the behavior of the agents are included in the framework as feedback signals in form of immediate rewards (i.e. to reach to the goal is good and the agent gets a positive reward; to go out of the borders is bad and then it gets a negative reward).

  • 2.

    Real-time simulation. The decision-making module of each embodied agent (pedestrian) is calculated offline. In simulation time only the addition of the pre-calculated terms of a lineal function is necessary to get the correspondent best action.

  • 3.

    It is capable of generating emergent collective behaviors.

  • 4.

    Multi-level learned behaviors. The resulting learned behaviors control the velocity of the agent, which is a task of the operational level, but they are also capable of path-planning and route choice, which are tasks corresponding to the strategical and the tactical levels respectively.

  • 5.

    Heterogeneous behaviors. The learned behaviors are different for each agent, providing variability in the simulation. This heterogeneity is intrinsic to the learned behaviors.

The aim of our work is not to provide a new pedestrian model (that implies the matching with real data) but to create plausible simulations of pedestrian groups (in terms of its adequacy to the pedestrian dynamics) to be used in virtual environments. In this animation context, agent-based pedestrian simulation is an active research field [18], [10] which considers simulations that can vary from small groups to crowds. Through the mentioned experiments we demonstrate that MARL-Ped is capable of generating realistic simulations of groups of pedestrians solving navigational problems at different levels (operational, tactical, strategical), handling the individual/group behaviors relationship problem mentioned before to produce the emergence of collective behaviors.

In order to show that learned behaviors resemble pedestrians, we compare our results with similar scenarios defined in Helbing’s social forces pedestrian model. This well-known model in the pedestrian modeling field, has common characteristics with MARL-Ped: it is a microscopic model that also uses a driving force to get the desired velocity of the agent. The comparison is carried out by fundamental diagrams and density maps that are common tools used in the pedestrian dynamics analysis.

The rest of the paper has the following sections. In Section 2 we present the related work. In Section 3, some fundamentals of RL and an overview of the framework is described. Section 4 describe the modules of MARL-Ped. In Section 5, we describe the configuration of the scenarios. In Sections 6 and 7 the results are discussed, and in Section 8 the conclusions and future work are exposed.

Section snippets

Related work

From the point of view of the theoretical foundations, our work has similarities with Hoogendoorn’s pedestrian route-choice model [19]. In this work the authors propose a Bellman-based optimization process to optimize an utility function designed as a weighted sum of route attributes. Using dynamic programming, a value function is calculated for the different spatial regions and used to find the pedestrian’s route. In our approach, the utility function is substituted by an immediate reward

Background and general overview

In this section we give an overview of the RL basic concepts used in this work and present our overall approach for MARL-Ped.

MARL-Ped framework description

In Fig. 1, a functional diagram of MARL-Ped’s agents is displayed. The modules have been enumerated with labels (Mi) to be more easily identified.

Description of the simulated scenarios

In this section, the scenarios of the experiments are introduced. These scenarios model common situations for real pedestrians in urban environments.

Learning results

In this section, the configuration as well as the performance reached by each learning process is described. There is not a fixed pattern to define the configuration of parameters and the strategies to be used in the learning process because each scenario has its own challenges that have to be addressed specifically.

Simulation results

In this section, we highlight the performance of MARL-Ped on the described scenarios. First we introduce the tools used to analyze different aspects of the performance. Then, we present the results and illustrate them with videos that can be seen at URL http:://www.uv.es/agentes/RL. These videos have been recorded and visualized in real time using Unity 3D Engine for the videos with 3D virtual environments.

Conclusions and future work

In this paper we explore the capabilities of our RL-based framework for pedestrian simulation (MARL-Ped) in three different paradigmatic situations. The main contribution of this work is the empirical demonstration that RL techniques are capable of converging to policies that generate pedestrian behaviors. Our framework solves the navigation problems at different pedestrian behavior levels (strategical, tactical and functional) on the contrary to other pedestrian models. Thus, the shortest path

Acknowledgements

The authors want to acknowledge Dr. Illés Farkas, who kindly provided us the code for the Helbing’s corridor experiment and helped us with the configuration parameters.

This work has been partially supported by the University of Valencia under project UV-INV-PRECOMP13-115032, the Spanish MICINN and European Commission FEDER funds under Grants Consolider-Ingenio CSD2006-00046, TIN2009-14475-C04-04, TRA2009-0080. Fernando Fernández is supported by Grant TIN2012-38079-C03-02 of Ministerio de

References (46)

  • W. Daamen, Modelling Passenger Flows in Public Transport Facilities, Ph.D. thesis, Delft University of Technology, The...
  • C. Reynolds

    Steering behaviors for autonomous characters

  • S. Patil et al.

    Directing crowd simulations using navigation fields

    IEEE Trans. Visual. Comput. Graph.

    (2011)
  • A. Treuille et al.

    Continuum crowds

    ACM Trans. Graph. (TOG)

    (2006)
  • M. Sung et al.

    Scalable behaviors for crowd simulations

  • D. Helbing et al.

    Self-organized pedestrian crowd dynamics: experiments, simulations, and design solutions

    Transport. Sci.

    (2005)
  • T.I. Lakoba et al.

    Modifications of the Helbing–Molnár–Farkas–Vicsek social force model for pedestrian evolution

    Simulation

    (2005)
  • D. O’Sullivan et al.

    Agent-based models and individualism: is the world agent-based?

    Environ. Plann. A

    (2000)
  • J. Pettre et al.

    simulation and validation of interactions between virtual walkers

  • S. Guy et al.

    Least-effort trajectories lead to emergent crowd behaviors

    Phys. Rev. E

    (2012)
  • G.K. Zipf

    Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology

    (1949)
  • K. Still, Crowd Dynamics, Ph.D. thesis, Department of Mathematics, Warwick University, UK, August...
  • D. Helbing et al.

    Pedestrian, crowd and evacuation dynamics

  • Cited by (0)

    View full text