Predetermined Rollbacks: An extension to Time Warp for spatially parallel agent-based simulation

https://doi.org/10.1016/j.simpat.2019.04.008Get rights and content

Abstract

Time management is an important factor that affects the speed of parallel and distributed simulations. Conservative time management mechanisms advance simulation time after synchronization of logical processes, whereas optimistic time management mechanisms loosen synchronization among processes to speed up simulation. Because of loosened synchronization, the optimistic approaches are vulnerable to causality faults that must be addressed by logical processes during run-time. Repairing a simulation, by means of a rollback mechanism, is one of the most time-consuming processes of optimistic approaches. In this paper, we propose a method that considers precautionary measures against possible future rollbacks for Time Warp mechanism due to Jefferson. Our proposed method, named Predetermined Rollbacks, uses a modified simulation engine that can detect and avoid unnecessary rollbacks. Our experiments demonstrate that the proposed method can significantly improve the speedup of Time Warp in agent-based simulations, where agents communicate in a shared environment.

Introduction

Parallel and distributed processing techniques have been widely employed by many researchers to execute computationally intensive simulations within acceptable periods. Although parallel and distributed simulation (PADS) has been one of the most commonly studied areas of modeling and simulation in the last few decades, there is still considerable room for improvement. The time synchronization mechanism is a complex and crucial part of PADS. The two primary approaches that are well-known are conservative and optimistic time management approaches. Conservative time management is one of the simplest synchronization techniques that can safely process logical processes (LP). LPs that employ a conservative time management mechanism avoid executing out-of-order events [1]. An LP does not advance simulation time unless it guarantees that no message from the past will be received. Therefore, LPs must be tightly synchronized. From this point of view, LPs are secured against causality faults. However, conservative approaches may lead to simulations slowing down in some cases because of unnecessary synchronization [1]. Optimistic approaches have been proposed to overcome this problem [2]. Optimistic LPs do not consider the synchronization of local times; instead, they execute events without considering the global time of the simulation. Therefore, optimistic LPs can receive a message from the past, which is referred to as straggler messages. When an LP receives a message from the past, it must address the causality fault. From this point of view, optimistic approaches provide a performance advantage by skipping some of the synchronization points [1].

Although Time Warp (TW) [2], which is a commonly used optimistic time management algorithm due to Jefferson, has been studied for many years, there still exists room for optimizations of certain kinds of models. One of the drawbacks of TW is that LPs may initiate rollbacks that are unnecessary. We focus on improving the performance of TW for spatial agent-based simulations by not performing unnecessary rollbacks even though a straggler message is received. The LPs that receive a straggler message must initiate a rollback operation in conventional TW to resolve possibly incorrect computations. Although some straggler messages do not include any real interaction among agents, they make LPs perform rollbacks unnecessarily. In this study, we propose an algorithm that provides LPs timely precautionary measures against straggler messages that will not produce any interaction. The proposed algorithm is named Predetermined Rollbacks (PR). PR focuses on determining situations that would trigger rollbacks before they occur. Therefore, unnecessary rollbacks, which are among of the most time-consuming components of optimistic approaches, can be eliminated. We present a case study, the Civil Violence model [3], which demonstrates that PR can significantly speed up spatially parallel simulations.

Section snippets

Spatially parallel and distributed simulations

The parallelization of spatial agent-based models can be classified as two approaches: agent parallel and spatially parallel (i.e. environment parallel) [4]. In the former approach, agents are shared among processors. Therefore, a processor is responsible for managing a subset of agents. In the latter approach, processors are responsible for managing a subspace of the simulation area where agents live. Both approaches have their own pros and cons. For instance, handling agent interactions

Problem

In spite of the advantages of TW, there still exists room for improvements. TW is based on rollback and recovery processes. An LP must restore a previous state of simulation to overcome causality faults when the LP receives a straggler message. Even if the straggler message does not produce any fault, LPs initiate rollbacks that are unnecessary. In Fig. 2, for instance, LPA initiates a rollback operation as soon as it receives a straggler message at time 20. LPA does not attempt to evaluate

Related work

Time management mechanisms have been studied for more than three decades. An extensive literature review has been presented in [18]. Studies have demonstrated that optimistic approaches exhibit significant performance improvements in PADS [19], [20], [21], [22]. Remarkably, a simulation speed record was broken by Barnes et al. [23]. However, it is not possible to claim that neither the conservative approach nor the optimistic approach is generally better in terms of performance. The advantages

Basics of predetermined rollbacks

We propose PR as an extension of TW to overcome the aforementioned problem associated with unnecessary rollbacks. In spatial simulations, the fact that the agents are close to each other does not mean that they interact with each other. Therefore, the motivation of PR is based on ignoring some of the straggler messages without changing the simulation results. Therefore, we develop a decision mechanism for LPs to detect straggler messages that do not produce a change in the simulation

Experimental environment

To demonstrate the manner in which PR can improve the performance of a spatially parallel simulation, we compared PR with conventional TW by using a basic form of the Civil Violence model.

Repast HPC is an open source, MPI-based, multi-agent and discrete event simulation tool developed by Argonne National Laboratory, Chicago, IL, USA [8]. Repast HPC provides spatial environments where agents live and interact within the neighborhood. LPs in Repast HPC must synchronize agents in their buffer

Discussion

Based on the presented results, it is evident that PR exhibits better performance than TW. The results also make it clear that the performance of PR is affected by the interaction probability of agents, which is dependent on agent density and buffer size. If agent density increases, the interaction probability also increases. The performance of both algorithms is similar when agent density is higher, as shown in Fig. 9(b). Fig. 11(a) and (b) clarify the reason for this convergence. In these

Conclusion

In this paper, we proposed a lightweight extension (PR) to TW to identify and avoid unnecessary rollbacks. The PR exhibited greater speedups compared to TW. Because PR does not incur remarkable overheads, TW could not perform significantly better than PR in any of the experiments. While the interaction probability increased, the performances of TW and PR became identical. It is commonly known that optimistic approaches may not be the best option if LPs must frequently communicate with each

Acknowledgments

We would like to thank TUBITAK ULAKBIM High Performance and Grid Computing Center, Ankara, Turkey, for the computing services provided for our experimental work.

References (67)

  • B. Bohnenstiehl et al.

    Kilocore: a fine-grained 1,000 processor array for task-parallel applications

    IEEE Micro

    (2017)
  • G. Chen et al.

    A 340 mV-to-0.9 V 20.2 tb/s source-synchronous hybrid packet/circuit-Switched 16 x 16 network-on-chip in 22 nm tri-gate CMOS

    IEEE J. Solid-State Circuits

    (2015)
  • N. Collier et al.

    Parallel agent-based simulation with repast for high performance computing

    Simulation

    (2013)
  • H. Avril et al.

    On rolling back and checkpointing in time warp

    IEEE Trans. Parallel Distrib. Syst.

    (2001)
  • E.N.M. Elnozahy et al.

    A survey of rollback-recovery protocols in message-passing systems

    ACM Comput. Surv.

    (2002)
  • B.R. Preiss et al.

    On the Trade-off between Time and Space in Optimistic Parallel Discrete-Event Simulation

    6th Workshop on Parallel and Distributed Simulation

    (1992)
  • Y.-B. Lin et al.

    Selecting the checkpoint interval in time warp simulation

    ACM SIGSIM Simul. Digest

    (1993)
  • R. Rönngren et al.

    Adaptive checkpointing in time warp

    ACM SIGSIM Simul. Digest

    (1994)
  • J. Fleischmann et al.

    Comparative analysis of periodic state saving techniques in time warp simulators

    Proceedings of the 9th Workshop on Parallel and distributed simulation

    (1995)
  • F. Quaglia

    A cost model for selecting checkpoint positions in time warp parallel simulation

    IEEE Trans. Parallel Distrib. Syst.

    (2001)
  • S. Jafer et al.

    Conservative vs. optimistic parallel simulation of devs and cell-devs: a comparative study

    Proceedings of the 2010 Summer Computer Simulation Conference

    (2010)
  • K.S. Perumalla

    Scaling time warp-based discrete event execution to 10⌃4 processors on a Blue Gene supercomputer

    Proceedings of the 4th international conference on computing frontiers

    (2007)
  • D.W. Bauer Jr. et al.

    Scalable time warp on Blue Gene supercomputers

    Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation

    (2009)
  • B.K. Görür et al.

    Repast HPC with Optimistic Time Management

    Proceedings of the 24th High Performance Computing Symposium

    (2016)
  • M. Plagge et al.

    Nemo: a massively parallel discrete-event simulation model for neuromorphic architectures

    ACM Trans. Model. Comput.Simul. (TOMACS)

    (2018)
  • P.D. Barnes et al.

    Warp speed: executing time warp on 1,966,080 cores

    Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

    (2013)
  • R.M. Fujimoto

    Distributed simulation systems

    Proceedings of the 35th Conference on Winter Simulation: Driving Innovation

    (2003)
  • D.R. Jefferson et al.

    Virtual time III: unification of conservative and optimistic synchronization in parallel discrete event simulation

    2017 Winter Simulation Conference (WSC)

    (2017)
  • A. Gafni

    Rollback mechanisms for optimistic distributed simulation systems

    SCS Multiconference on Distributed Simulation, San Diego, CA, USA

    (1988)
  • Y.-B. Lin et al.

    A study of time warp rollback mechanisms

    ACM Trans. Model. Comput.Simul. (TOMACS)

    (1991)
  • D. West

    Optimising Time Warp: Lazy Rollback and Lazy Reevaluation

    (1988)
  • B.K. Görür et al.

    Improving the Performance of Optimistic Time Management Mechanism with Sub-State Saving

    Proceedings of the 25th High Performance Computing Symposium

    (2017)
  • H.V. Leong et al.

    Using message semantics to reduce rollback in the time warp mechanism

  • Cited by (2)

    View full text