Predetermined Rollbacks: An extension to Time Warp for spatially parallel agent-based simulation
Introduction
Parallel and distributed processing techniques have been widely employed by many researchers to execute computationally intensive simulations within acceptable periods. Although parallel and distributed simulation (PADS) has been one of the most commonly studied areas of modeling and simulation in the last few decades, there is still considerable room for improvement. The time synchronization mechanism is a complex and crucial part of PADS. The two primary approaches that are well-known are conservative and optimistic time management approaches. Conservative time management is one of the simplest synchronization techniques that can safely process logical processes (LP). LPs that employ a conservative time management mechanism avoid executing out-of-order events [1]. An LP does not advance simulation time unless it guarantees that no message from the past will be received. Therefore, LPs must be tightly synchronized. From this point of view, LPs are secured against causality faults. However, conservative approaches may lead to simulations slowing down in some cases because of unnecessary synchronization [1]. Optimistic approaches have been proposed to overcome this problem [2]. Optimistic LPs do not consider the synchronization of local times; instead, they execute events without considering the global time of the simulation. Therefore, optimistic LPs can receive a message from the past, which is referred to as straggler messages. When an LP receives a message from the past, it must address the causality fault. From this point of view, optimistic approaches provide a performance advantage by skipping some of the synchronization points [1].
Although Time Warp (TW) [2], which is a commonly used optimistic time management algorithm due to Jefferson, has been studied for many years, there still exists room for optimizations of certain kinds of models. One of the drawbacks of TW is that LPs may initiate rollbacks that are unnecessary. We focus on improving the performance of TW for spatial agent-based simulations by not performing unnecessary rollbacks even though a straggler message is received. The LPs that receive a straggler message must initiate a rollback operation in conventional TW to resolve possibly incorrect computations. Although some straggler messages do not include any real interaction among agents, they make LPs perform rollbacks unnecessarily. In this study, we propose an algorithm that provides LPs timely precautionary measures against straggler messages that will not produce any interaction. The proposed algorithm is named Predetermined Rollbacks (PR). PR focuses on determining situations that would trigger rollbacks before they occur. Therefore, unnecessary rollbacks, which are among of the most time-consuming components of optimistic approaches, can be eliminated. We present a case study, the Civil Violence model [3], which demonstrates that PR can significantly speed up spatially parallel simulations.
Section snippets
Spatially parallel and distributed simulations
The parallelization of spatial agent-based models can be classified as two approaches: agent parallel and spatially parallel (i.e. environment parallel) [4]. In the former approach, agents are shared among processors. Therefore, a processor is responsible for managing a subset of agents. In the latter approach, processors are responsible for managing a subspace of the simulation area where agents live. Both approaches have their own pros and cons. For instance, handling agent interactions
Problem
In spite of the advantages of TW, there still exists room for improvements. TW is based on rollback and recovery processes. An LP must restore a previous state of simulation to overcome causality faults when the LP receives a straggler message. Even if the straggler message does not produce any fault, LPs initiate rollbacks that are unnecessary. In Fig. 2, for instance, LPA initiates a rollback operation as soon as it receives a straggler message at time 20. LPA does not attempt to evaluate
Related work
Time management mechanisms have been studied for more than three decades. An extensive literature review has been presented in [18]. Studies have demonstrated that optimistic approaches exhibit significant performance improvements in PADS [19], [20], [21], [22]. Remarkably, a simulation speed record was broken by Barnes et al. [23]. However, it is not possible to claim that neither the conservative approach nor the optimistic approach is generally better in terms of performance. The advantages
Basics of predetermined rollbacks
We propose PR as an extension of TW to overcome the aforementioned problem associated with unnecessary rollbacks. In spatial simulations, the fact that the agents are close to each other does not mean that they interact with each other. Therefore, the motivation of PR is based on ignoring some of the straggler messages without changing the simulation results. Therefore, we develop a decision mechanism for LPs to detect straggler messages that do not produce a change in the simulation
Experimental environment
To demonstrate the manner in which PR can improve the performance of a spatially parallel simulation, we compared PR with conventional TW by using a basic form of the Civil Violence model.
Repast HPC is an open source, MPI-based, multi-agent and discrete event simulation tool developed by Argonne National Laboratory, Chicago, IL, USA [8]. Repast HPC provides spatial environments where agents live and interact within the neighborhood. LPs in Repast HPC must synchronize agents in their buffer
Discussion
Based on the presented results, it is evident that PR exhibits better performance than TW. The results also make it clear that the performance of PR is affected by the interaction probability of agents, which is dependent on agent density and buffer size. If agent density increases, the interaction probability also increases. The performance of both algorithms is similar when agent density is higher, as shown in Fig. 9(b). Fig. 11(a) and (b) clarify the reason for this convergence. In these
Conclusion
In this paper, we proposed a lightweight extension (PR) to TW to identify and avoid unnecessary rollbacks. The PR exhibited greater speedups compared to TW. Because PR does not incur remarkable overheads, TW could not perform significantly better than PR in any of the experiments. While the interaction probability increased, the performances of TW and PR became identical. It is commonly known that optimistic approaches may not be the best option if LPs must frequently communicate with each
Acknowledgments
We would like to thank TUBITAK ULAKBIM High Performance and Grid Computing Center, Ankara, Turkey, for the computing services provided for our experimental work.
References (67)
- et al.
Run-time selection of the checkpoint interval in time warp based simulations
Simul. Pract. Theory
(1998) - et al.
Analysis of the increase and decrease algorithms for congestion avoidance in computer networks
Comput. Netw. ISDN Syst.
(1989) - et al.
Synchronization methods in parallel and distributed discrete-event simulation
Simul. Modell. Pract. Theory
(2013) - et al.
Exploiting intra-object dependencies in parallel simulation
Inf. Process. Lett.
(1999) - et al.
Efficient execution of replicated transportation simulations with uncertain vehicle trajectories
Procedia Comput. Sci.
(2015) Parallel and Distributed Simulation Systems
(2000)Virtual time
ACM Trans. Program. Lang. Syst.
(1985)Modeling civil violence: an agent-based computational approach
Proc. Natl. Acad. Sci.
(2002)- et al.
Large Scale Agent-Based Modelling: A Review and Guidelines for Model Scaling
Agent-Based Models of Geographical Systems
(2012) - et al.
Parallelization strategies for spatial agent-Based models
Int. J. Parallel. Program
(2017)
Kilocore: a fine-grained 1,000 processor array for task-parallel applications
IEEE Micro
A 340 mV-to-0.9 V 20.2 tb/s source-synchronous hybrid packet/circuit-Switched 16 x 16 network-on-chip in 22 nm tri-gate CMOS
IEEE J. Solid-State Circuits
Parallel agent-based simulation with repast for high performance computing
Simulation
On rolling back and checkpointing in time warp
IEEE Trans. Parallel Distrib. Syst.
A survey of rollback-recovery protocols in message-passing systems
ACM Comput. Surv.
On the Trade-off between Time and Space in Optimistic Parallel Discrete-Event Simulation
6th Workshop on Parallel and Distributed Simulation
Selecting the checkpoint interval in time warp simulation
ACM SIGSIM Simul. Digest
Adaptive checkpointing in time warp
ACM SIGSIM Simul. Digest
Comparative analysis of periodic state saving techniques in time warp simulators
Proceedings of the 9th Workshop on Parallel and distributed simulation
A cost model for selecting checkpoint positions in time warp parallel simulation
IEEE Trans. Parallel Distrib. Syst.
Conservative vs. optimistic parallel simulation of devs and cell-devs: a comparative study
Proceedings of the 2010 Summer Computer Simulation Conference
Scaling time warp-based discrete event execution to 10⌃4 processors on a Blue Gene supercomputer
Proceedings of the 4th international conference on computing frontiers
Scalable time warp on Blue Gene supercomputers
Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
Repast HPC with Optimistic Time Management
Proceedings of the 24th High Performance Computing Symposium
Nemo: a massively parallel discrete-event simulation model for neuromorphic architectures
ACM Trans. Model. Comput.Simul. (TOMACS)
Warp speed: executing time warp on 1,966,080 cores
Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
Distributed simulation systems
Proceedings of the 35th Conference on Winter Simulation: Driving Innovation
Virtual time III: unification of conservative and optimistic synchronization in parallel discrete event simulation
2017 Winter Simulation Conference (WSC)
Rollback mechanisms for optimistic distributed simulation systems
SCS Multiconference on Distributed Simulation, San Diego, CA, USA
A study of time warp rollback mechanisms
ACM Trans. Model. Comput.Simul. (TOMACS)
Optimising Time Warp: Lazy Rollback and Lazy Reevaluation
Improving the Performance of Optimistic Time Management Mechanism with Sub-State Saving
Proceedings of the 25th High Performance Computing Symposium
Using message semantics to reduce rollback in the time warp mechanism
Cited by (2)
Evaluating Parallelization Strategies for Large-Scale Individual-based Infectious Disease Simulations
2023, Proceedings - Winter Simulation ConferenceMulti-scale filtering synchronization method for vehicle-infrastructure cooperative twin-simulation testing
2022, Jiaotong Yunshu Gongcheng Xuebao/Journal of Traffic and Transportation Engineering