MARL-Ped: A multi-agent reinforcement learning based framework to simulate pedestrian groups
Introduction
In the current state of the art there are several pedestrian simulation approaches that focuses on steering the individuals (microscopic simulation) to generate both individual and group pedestrian behaviors. Microscopic pedestrians models consider the individual interactions and try to model the position and velocity of each pedestrian over the time. Between the most representative microscopic seminal models of pedestrians we have the cellular automata models [1], behavioral rule-based models [2], cognitive models [3], Helbing’s social forces model [4] and psychological models [5]. In a microscopic simulator, the individuals are simulated as independent entities that interact with the others and with the environment, taking decisions to modify its dynamic state (including the calculation of the sum of a set of forces as a kind of decision). The decision-making process in the microscopic simulators follows a hierarchical scheme [6]: strategical, tactical and operational. The destinations and path planning are chosen at the strategical level, the route choice is performed at the tactical level and the instantaneous decisions to modify the kinematic state are taken at the operational level. Several microscopic simulators that focuses on the reproduction of the local interactions work only at the operational level [7].
A common problem in the microscopic models is the relationship between the individual behaviors and the group behavior. Traditionally, rule-based systems [8], [4] are the most popular in this area to simulate local interactions. However, due to the complexity of the multi-agent collision avoidance, it is difficult to generate a lifelike group motion that follows the local rules [9]. Most agent-based models separate the local interactions from the necessary global path planning. To do this, there are two main approaches. One is to pre-compute or user-edit a path-planning map that is represented as a guidance field [9] or as a potential and velocity field [10]. Other consists on separating the local and global navigation problems in a layered model [11]. To make that split inside the agent model has the advantage that intelligent or psychological properties to the agents behaviors can be introduced [5], [12]. One indicator that this relationship is correctly resolved is that certain collective patterns appear when groups of pedestrians are under specific situations, as happens in the real world. Several collective behaviors have been described to appear in specific group situations such as lane formations in corridors [13], faster-is-slower effect [14] and arch-like cloggings at bottlenecks [15], [13]. Social forces and its variants [4], agent-based models [16] and animal-based approaches [17], are microscopic models that have being successful in emerging collective pedestrian behaviors using different approaches. In pedestrian modeling, the capability to reproduce these phenomena, collective behavior or self-organization phenomena, is an indicator of the quality of the model.
In this work, a multi-agent RL-based framework for pedestrians simulation (MARL-Ped) is evaluated in three different scenarios that are described in Section 5. Each scenario faces a different simulation problem. This framework constitutes a different approach to the existent microscopic simulators, that uses learning techniques to create an individual controller for the navigation of each simulated pedestrian. The MARL-Ped framework offers the following benefits:
- 1.
Behavior building instead of behavior modeling. The user does not have to specify guidance rules or other models to define the pedestrian’s behavior. Only high level restrictions over the behavior of the agents are included in the framework as feedback signals in form of immediate rewards (i.e. to reach to the goal is good and the agent gets a positive reward; to go out of the borders is bad and then it gets a negative reward).
- 2.
Real-time simulation. The decision-making module of each embodied agent (pedestrian) is calculated offline. In simulation time only the addition of the pre-calculated terms of a lineal function is necessary to get the correspondent best action.
- 3.
It is capable of generating emergent collective behaviors.
- 4.
Multi-level learned behaviors. The resulting learned behaviors control the velocity of the agent, which is a task of the operational level, but they are also capable of path-planning and route choice, which are tasks corresponding to the strategical and the tactical levels respectively.
- 5.
Heterogeneous behaviors. The learned behaviors are different for each agent, providing variability in the simulation. This heterogeneity is intrinsic to the learned behaviors.
The aim of our work is not to provide a new pedestrian model (that implies the matching with real data) but to create plausible simulations of pedestrian groups (in terms of its adequacy to the pedestrian dynamics) to be used in virtual environments. In this animation context, agent-based pedestrian simulation is an active research field [18], [10] which considers simulations that can vary from small groups to crowds. Through the mentioned experiments we demonstrate that MARL-Ped is capable of generating realistic simulations of groups of pedestrians solving navigational problems at different levels (operational, tactical, strategical), handling the individual/group behaviors relationship problem mentioned before to produce the emergence of collective behaviors.
In order to show that learned behaviors resemble pedestrians, we compare our results with similar scenarios defined in Helbing’s social forces pedestrian model. This well-known model in the pedestrian modeling field, has common characteristics with MARL-Ped: it is a microscopic model that also uses a driving force to get the desired velocity of the agent. The comparison is carried out by fundamental diagrams and density maps that are common tools used in the pedestrian dynamics analysis.
The rest of the paper has the following sections. In Section 2 we present the related work. In Section 3, some fundamentals of RL and an overview of the framework is described. Section 4 describe the modules of MARL-Ped. In Section 5, we describe the configuration of the scenarios. In Sections 6 and 7 the results are discussed, and in Section 8 the conclusions and future work are exposed.
Section snippets
Related work
From the point of view of the theoretical foundations, our work has similarities with Hoogendoorn’s pedestrian route-choice model [19]. In this work the authors propose a Bellman-based optimization process to optimize an utility function designed as a weighted sum of route attributes. Using dynamic programming, a value function is calculated for the different spatial regions and used to find the pedestrian’s route. In our approach, the utility function is substituted by an immediate reward
Background and general overview
In this section we give an overview of the RL basic concepts used in this work and present our overall approach for MARL-Ped.
MARL-Ped framework description
In Fig. 1, a functional diagram of MARL-Ped’s agents is displayed. The modules have been enumerated with labels () to be more easily identified.
Description of the simulated scenarios
In this section, the scenarios of the experiments are introduced. These scenarios model common situations for real pedestrians in urban environments.
Learning results
In this section, the configuration as well as the performance reached by each learning process is described. There is not a fixed pattern to define the configuration of parameters and the strategies to be used in the learning process because each scenario has its own challenges that have to be addressed specifically.
Simulation results
In this section, we highlight the performance of MARL-Ped on the described scenarios. First we introduce the tools used to analyze different aspects of the performance. Then, we present the results and illustrate them with videos that can be seen at URL http:://www.uv.es/agentes/RL. These videos have been recorded and visualized in real time using Unity 3D Engine for the videos with 3D virtual environments.
Conclusions and future work
In this paper we explore the capabilities of our RL-based framework for pedestrian simulation (MARL-Ped) in three different paradigmatic situations. The main contribution of this work is the empirical demonstration that RL techniques are capable of converging to policies that generate pedestrian behaviors. Our framework solves the navigation problems at different pedestrian behavior levels (strategical, tactical and functional) on the contrary to other pedestrian models. Thus, the shortest path
Acknowledgements
The authors want to acknowledge Dr. Illés Farkas, who kindly provided us the code for the Helbing’s corridor experiment and helped us with the configuration parameters.
This work has been partially supported by the University of Valencia under project UV-INV-PRECOMP13-115032, the Spanish MICINN and European Commission FEDER funds under Grants Consolider-Ingenio CSD2006-00046, TIN2009-14475-C04-04, TRA2009-0080. Fernando Fernández is supported by Grant TIN2012-38079-C03-02 of Ministerio de
References (46)
- et al.
A microsimulation model for pedestrian flows
Math. Comput. Simulat.
(1985) - et al.
Specification, estimation and validation of a pedestrian walking behavior model
Transport. Res.
(2009) - et al.
Two-level modeling framework for pedestrian route choice and walking behaviors
Simulat. Model. Pract. Theory
(2012) - et al.
Morphological and dynamical aspects of the room evacuation process
Phys. A: Stat. Mech. Appl.
(2007) - et al.
Animal dynamics based approach for modeling pedestrian crowd egress under panic conditions
Transport. Res. Part B: Methodol.
(2011) - et al.
Pedestrian route-choice and activity scheduling theory and models
Transport. Res. Part B: Methodol.
(2004) Evolution of corridor following behavior in a noisy world
- et al.
Autonomous pedestrians
- et al.
Social force model for pedestrian dynamics
Phys. Rev. E
(1995) - et al.
Controlling individual agents in high-density crowd simulation