Keywords

1 Introduction

Effective social planning and behavioural understanding is essential to improving global human well-being. Given changing global economic, social and political landscapes, how will organizations and nations forecast and prioritize their internal and external actions? It is difficult for governments and organizations to effectively prioritize their policies and resource spending due to the complex nature of human interaction–especially on a global scale. Despite libraries full of expert strategic guidance and lessons learned in diplomatic, military, and development efforts, a major portion of the human population lacks basic physical and social resources for personal well-being. In a rapidly evolving and complex landscape, we must develop new tools to better leverage our accumulated knowledge, experience, and the myriad of new data sensors and emerging computational technologies.

One such tool is our ability to now produce large scale social simulations to model behavioral patterns. In this work we specifically focus on computational social science simulation tools in relation to modeling the global human population given current computer systems capabilities. Fortunately for us, computational capability is growing at a faster rate than the human population of Earth, which will allow us to increase the complexity (and accuracy) of the models and make global scale simulations more tractable on increasingly less expensive IT infrastructure.

1.1 Related Work

Over the past decade a number of researchers have demonstrated that large scale agent-based simulations are viable. In 2008, Lysenko and D’Souza [1] used GPGPU accelerated techniques to model up to 16 million agents spatially using an AMD Athlon64 3500+ with 1 GB RAM and an NVidia GeForce 8800 GTX GPU. In 2010, Rakowski et al. [2] created a grid-based framework to simulate the 38 million human population of Poland using data from LandScan and the Polish National Census Bureau. In 2011, Parker and Epstein [3] used GSAM to implement a graph-based model of disease propagation amongst 6.75 billion people using 32 CPU cores and 256 GB of memory. In 2014, Richmond [4] used FLAME GPU [5, 6] 1.3 to simulate up to 16.7 million simple agents with SugarScape on an Intel Core i7-2600K Machine using an NVIDIA K40 GPU with CUDA 6.0. In 2015, Collier et al. [7] simulated the spread of CA-MRSA throughout the population of Chicago (2.9 million people) using a graph-based representation and Repast HPC. Also in 2015, Lettieri et al. [8] modeled the spread of social norms using D-MASON [9] amongst a graph-based population of 2.5 million commuters on a cluster of 8 16 core servers. Their work serves as a foundation of demonstrated computational capability motivating the evolution of global scale models.

2 Creating a Global Scale Human Well-Being Simulation

2.1 A Target Framework

Our societies have developed advanced technologies to meet diverse needs (advanced medicines, self driving cars, new energy sources) but we have yet to solve global grand challenges such as hunger, violence and economic inequality. Understanding global complexities with increased accuracy and resolution is essential to formulating effective global strategy, policy and plans. If the computational capability of our devices continues to double every two years (leveraging parallelism to complement deviations from Moore’s Law) we must start developing software frameworks (and parallel algorithms) for 2025 that leverage highly parallel systems 30\(\times \) more capable than today’s leading technology. The ability to globally model 10 billion human agents should not be a hurdle, we must make it part of the solution.

As one example framework (Fig. 1), a Global Open Simulator (GOS) would provide a dynamically re-scalable platform on which to develop, test, verify and validate strategy and plans at a speed, breadth, complexity and depth of resolution previously deemed intractable. The open nature of the framework is a differentiator from prior attempts which became unsustainably mired in licensing restrictions and limited support/expertise. A modular GOS framework will allow for layered application of influential factors on human behavior such as culture, government and media. The level of aggregation can vary from a global scale of 10 billion agents (humans) down to an individual. Additionally, temporal modeling variance allows the strategic analyst to set the time horizon (hour, day, month, etc.) and dynamically change resolution if an emergent behavior is triggered (agents reach a certain hunger or insecurity level).

Fig. 1.
figure 1

A visual representation of a target GOS framework.

2.2 Algorithm Analysis

Before we started running actual tests at the scale of 1 billion agents (as discussed in the next section) we wanted to evaluate the basic algorithmic bounds for our simulation in terms of computational requirement, data size requirements (both transient RAM and persistent disk storage) and, ultimately, the potential computational platforms on which to run our tests.

In the Worst Case: First, assume every person on the planet interacts with everyone else (in a single time step), we can thus use \(\eta ^2\) as an upper bound, where \(\eta \) is the number of agents (order 10 billion). Second, assume every agent in the simulation is updated at every step (\(\sigma \)). For instance, if we want each step to represent an hour, we would need to have 365 \(\times \) 24 steps to study one year. Third, we would need to account for the number of attributes (\(\alpha \)) each agent has; and assume each attribute would have to be calculated for each step for each agent. The final upper bound equation for the computational complexity can be modeled as follows:

$$\begin{aligned} \text {Worst Case Complexity} = \eta ^2 \times \sigma \times \alpha \end{aligned}$$
(1)

For the data complexity each agent (and their attributes) will need to be stored in RAM to minimize data access time. Additionally, we need to store some information regarding the degree of connectivity between agents, should connectivity persist across time steps (or should the model require historical reference to prior connectivity). As a result, the RAM data complexity will be the number of agents multiplied by the size of each agent and the number of connections. If we assume that we have 10 billion agents (each agent holds 1KB of data) we will need 10 TB of RAM. Assuming the \(\eta ^2\) worst case interactions indicate a connection at each step the WC RAM for connecting \(\eta ^2\) edges \(\times \) 1 Byte quickly exceeds even that of major HPC clusters where RAM can be accessed in aggregate via MPI. Thus our brief analysis directs us to find alternate implementations to reduce RAM consumption.

Better Cases: One way to reduce the computational complexity is to introduce dynamic multiscale. This means that instead of updating all 10 billion agents every step, we would only update a small aggregation of agents per step, with the entire population being recalibrated only after several steps. Another way to greatly reduce the computational complexity is to reduce the number of interactions per step. It is likely unrealistic to assume that everyone on the planet interacts with every other person on the planet at each step (note: this paper explicitly makes no argument in terms of the most realistic models). A better model will reduce the number of interactions from order \(\eta ^2\) to something smaller, such as order \(\eta \text { log }\eta \) or just order \(\eta \). For instance, if we assume that people will only interact with the members in their nuclear family of 10 people (or 10 random people) at each step, the number of interactions per step would be reduced to \(10\,\times \,\eta \).

To save memory all of each agent’s attributes could be in a large group array, instead of in each individual agent’s object as indicated by Parker et al. [3]. Each agent is not stored as its own separate object, but rather just an entry in each array in the group class, saving memory on object overhead. Another method is to use agent compression, which is when similar agents are grouped together as an aggregate agent that will behave as a single agent [1]. Finally one could simplify edges via an integer representation such that storing similar edges as value ‘4’ results in a lookup to the specification of 4’s properties rather than each edge actually storing the full property set.

Ultimately the choice of “better case” algorithmic reductions has varying impact depending upon the computational platform. In this case we typically refer to enterprise cluster, server or high end consumer parallel programming platforms and architectures. For instance, in 2008, a GPU-based framework for agent-based modeling yielded over a 9000\(\times \) speed increase when compared to the contemporary CPU-based frameworks of SugarScape [1]. With equally impressive scale and speed on a totally different (non-GPU) architecture Repast HPC enables scalable tightly coupled MPI and MPI+OpenMP based simulations on HPC clusters [10]. Other less tightly coupled simulation techniques have run on highly distributed commercial cloud infrastructures to potentially facilitate more cost-effective or accessible simulation infrastructure [11].

2.3 Tests at 1 Billion

To test the practicality of our global human well-being modeling objectives, we set an initial goal to simulate up to 1 billion agents. We used RepastHPC version 2.1 as a framework for our simulations, using Open MPI version 1.8.7 and the Intel v15 compiler. Our server was a Dell PowerEdge R920 with 4 12 core 2.3 GHz Intel Xeon CPUs (E7-4850v2) with 3 TB RAM running the RHEL6 OS.

We used the well-known prisoner’s dilemma model [12] as provided as part of a Repast tutorial. In this implementation pairs of interacting agents randomly decide whether to cooperate with their partner. Points are then assigned to each agent based on whether they chose to cooperate, and whether their partner chose to cooperate. The model was a bit too computationally simplistic to accurately represent calculations over multiple human attributes, so we increased the computational complexity by adding 1000 floating point operations per agent interaction per step. We also made some small changes to the model’s graph instantiation (prior to the first step) which allowed for the same number of paired interactions per step but removed a non-linear instantiation computational cost.

We started by testing how well runtime reduced with increasing core counts. According to our trials, when modeling 1 million agents for 100 steps, doubling the core count generally halved the runtime in linear fashion up to 32 cores. Bridging multiple servers for more cores showed some significant MPI communication overhead, and as our focus was shifting to RAM limitations, we therefore decided to use 32 cores on a single large memory server for the remainder of the tests. We then started running our simulations on varying numbers of agents in order to test scalability. After our edits to the original RepastHPC code, we were able to achieve linear scaling for both runtime and RAM usage for increasing numbers of agents. As a result, we were able to run a 100 step simulation of 1 billion agents in 29 h using 800 GB of RAM as shown in Fig. 2.

Fig. 2.
figure 2

Runtime and RAM usage for varying agent counts using 32 cores for 100 steps.

3 Toward a Modular Framework for 10 Billion

3.1 Computational Limits

In this work we successfully performed a social simulation (based on the prisoner’s dilemma) of 1 billion human agents over 100 steps using a graph representation in roughly one day’s compute time (29 h). This was possible on a single enterprise class large memory server. The primary resource limitation was RAM consumption which would have been 8 TB for a 10 billion agent system. It is feasible to aggregate 8 TB of RAM across numerous HPC cluster compute nodes but the MPI communication overhead and cost of the numerous cluster nodes would be prohibitive compared to a single large memory server. Based on our findings it was more important to re-organize code for minimal memory footprint than to modify the computations at each step to reduce runtime. This of course is entirely dependent upon the interaction model you choose but still highlights the fact that memory requirements are a first class design consideration when designing a platform to handle 10 billion human agents.

3.2 Vision of a Global Collaboration Platform

The authors have a vision for a community evolved platform on which to develop and share human-well being models that scale up to the global population. Certainly, smaller scale or multi-scale models would also be supported, but we want to focus design efforts on developing infrastructure over the next 10 years that will facilitate increasingly accurate and complex models up to 8 billion (the 2025 mean world population per 2015 United Nations estimates). The computational estimates we presented, simulation results at 1 billion and peer publications provide the foundation to enable the continued project evolution. In Fig. 3 we provide one example vision of the distributed infrastructure that will allow for community driven evolution of the models.

The following are critical steps in initial development and implementation:

  • Establish an Initial Team of (subject matter experts) SMEs

  • Scope Phase One Human Behavior Layers

  • Scope Phase One Aggregate Population Measures

  • Design Prototype Computational Architecture and Portal Interface

  • Public Messaging and Coordination Plan for Volunteer Participants

  • Identify verification and validation measures to include regression testing on historical data outcomes and integration of realtime incites from deep learning as applied to various big data sources

Fig. 3.
figure 3

Infrastructure for community driven human well-being simulation.

3.3 Future Work

Our team plans to extend more human simulation models to the 1–10 billion agent scale over multiple computer architectures to better classify the computational resource requirements of various model representations (geospatial, graph, aggregated objects, etc.) Our plan is to use open models publically posted via the Open Agent Based Modeling Consortium (we are currently scaling up a model posted by Dr. Christopher Thron). Similarly we hope to cross reference the varied model representations across common HPC, enterprise and high end consumer architectural platforms. The architectural cross comparison will likely be more challenging as some open codes written for MPI may not be naturally algorithmically portable to accelerators (GPUs, FPGA, etc.) and vise versa.