ABSTRACT
Recently we built a system that uses profiling data to automatically parallelize Mercury programs by finding conjunctions with expensive conjuncts that can run in parallel with minimal synchronization delays. This worked very well in many cases, but in cases of tail recursion, we got much lower speedups than we expected, due to excessive memory usage. In this paper, we present a novel program transformation that eliminates this problem, and also allows recursive calls inside parallel conjunctions to take advantage of tail recursion optimization. Our benchmark results show that our new transformation greatly increases the speedups we can get from parallel Mercury programs; in one case, it changes no speedup into almost perfect speedup on four cores.
- Johan Bevemyr, Thomas Lindgren, and Håkan Millroth. Reform Prolog: the language and its implementation. In In Proc. of the 10th Int'l Conference on Logic Programming, pages 283--298. MIT Press, 1993. Google ScholarDigital Library
- Paul Bone, Zoltan Somogyi, and Peter Schachte. Estimating the overlap between dependent computations for automatic parallelization. Theory and Practice of Logic Programming, 11(4--5):575--591, 2011.Google Scholar
- John T. Feo, David C. Cann, and Rodney R. Oldehoeeft. A report on the Sisal language project. Journal of Parallel and Disributed Computing, 10:349--366, 1990. Google ScholarDigital Library
- Gopal Gupta and p Enrico Pontelli. Optimization schemas for parallel implementation of non-deterministic languages and systems. Software: Practice and Experience, 31(12):1143--1181, 2001. Google ScholarDigital Library
- Robert H Halstead. Implementation of Multilisp: Lisp on a multiprocessor. In Proceedings of the 1984 ACM Symposium on List and Functional Programming, pages 9--17, Austin, Texas, 1984. Google ScholarDigital Library
- P. Lopez, M. Hermenegildo, and S. Debray. A methodology for granularity-based control of parallelism in logic programs. Journal of Symbolic Computation, 22(4):715--734, 1996. Google ScholarDigital Library
- Simon Marlow, Simon Peyton Jones, and Satnam Singh. Runtime support for multicore Haskell. SIGPLAN Notices, 44(9):65--78, 2009. Google ScholarDigital Library
- Peter Wang and Zoltan Somogyi. Minimizing the overheads of dependent AND-parallelism. In Proceedings of the 27th International Conference on Logic Programming, Lexington, Kentucky, 2011.Google Scholar
Index Terms
- Controlling loops in parallel mercury code
Recommendations
Interpreting Parallel Processor Performance Measurements
This paper discusses execution time versus number of simultaneous operations in parallel computing systems. The main focus is on shared memory multiprocessors. A model for execution time as a function of the number of processes used in a computation is ...
Implicit heterogeneous and parallel programming
Programmers are often required to develop parallel programs using new parallel languages or parallel extensions to existing languages that are different from the languages they used previously on sequential machines. As a consequence, programmers are ...
Comments