ABSTRACT
Many parallel programs are intended to yield deterministic results, but unpredictable thread or process interleavings can lead to subtle bugs and nondeterminism. We are exploring a producer-consumer memory model---SPMC---for efficient system-enforced deterministic parallelism. However, the previous eager page mapping wastes physical memory, and cannot support large-size and real applications. This paper presents a novel lazy tree mapping approach to the model, which introduces "shadow page table" for allocating pages "on demand", and extends an SPMC region by a tree of lazily generated pages, representing an infinite stream on reusing a finite-size of virtual addresses. We build Dlinux to emulate the SPMC model entirely in Linux user space to make the SPMC more powerful. Dlinux uses virtual memory to emulate physical pages, and sets up page tables at user-level to emulate lazy tree mapping. Atop the SPMC, DetMP and DetMPI are explored and integrated into Dlinux, offering both thread- and process-level deterministic message passing programming. Experimental evaluations suggest lazy tree mapping improves memory use and address reuse. Dlinux scales close to ideal with 2048*2048 matrices for matmult, and better than MPICH2 for some workloads with larger input datasets.
- Cyrille Artho, Klaus Havelund, and Armin Biere. High-level data races. In VVEIS, pages 82--93, April 2003.Google ScholarCross Ref
- Amittai Aviram et al. Efficient system-enforced deterministic parallelism. In 9th OSDI, October 2010. Google ScholarDigital Library
- Tom Bergan et al. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In 15th ASPLOS, March 2010. Google ScholarDigital Library
- Tom Bergan et al. Deterministic process groups in dOS. In 9th OSDI, October 2010. Google ScholarDigital Library
- Emery D. Berger et al. Grace: Safe multithreaded programming for C/C++. In 24th OOPSLA, October 2009. Google ScholarDigital Library
- Robert L. Bocchino et al. A type and effect system for deterministic parallel Java. In OOPSLA, October 2009. Google ScholarDigital Library
- Heming Cui et al. Stable deterministic multithreading through schedule memoization. In 9th OSDI, October 2010. Google ScholarDigital Library
- Rob F. Van der Wijingaart. NAS parallel benchmarks version 2.4. Technical Report NAS-02-007, NASA Ames Research Center, October 2002.Google Scholar
- Stephen A. Edwards et al. Programming shared memory multiprocessors with deterministic message-passing concurrency: Compiling SHIM to Pthreads. In DATE, March 2008. Google ScholarDigital Library
- Gilles Kahn. The semantics of a simple language for parallel programming. In Information Processing, pages 471--475, Amsterdam, Netherlands, 1974. North-Holland.Google Scholar
- Doug Lea. A memory allocator, 2000.Google Scholar
- E. A. Lee. The problem with threads. Computer, 39(5):33--42, May 2006. Google ScholarDigital Library
- Shan Lu et al. Learning from mistakes --- a comprehensive study on real world concurrency bug characteristics. In 13th ASPLOS, pages 329--339, March 2008. Google ScholarDigital Library
- Mathematics and Computer Science Division Argonne National Laboratory. MPICH2-1.4: a high-performance and widely portable implementation of the MPI standard, June 2011.Google Scholar
- Message Passing Interface Forum. MPI: A message-passing interface standard version 2.2, September 2009.Google Scholar
- Marek Olszewski, Jason Ansel, and Saman Amarasinghe. Scaling deterministic multithreading. In 2nd WoDet, March 2011.Google Scholar
- Steven J. Plimpton et al. MapReduce in MPI for large-scale graph algorithms. Parallel Comput., 37(9):610--632, September 2011. Google ScholarDigital Library
- Colby Ranger et al. Evaluating MapReduce for multi-core and multiprocessor systems. In 13th HPCA, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- Yu Zhang and Bryan Ford. A virtual memory foundation for scalable deterministic parallelism. In 2nd APSys, July 2011. Google ScholarDigital Library
Recommendations
Lazy tree splitting
Nested data-parallelism (NDP) is a language mechanism that supports programming irregular parallel applications in a declarative style. In this paper, we describe the implementation of NDP in Parallel ML (PML), which is a part of the Manticore system. ...
Lazy tree splitting
ICFP '10Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel ...
Lazy tree splitting
ICFP '10: Proceedings of the 15th ACM SIGPLAN international conference on Functional programmingNested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel ...
Comments