PAEAN: Portable and scalable runtime support for parallel Haskell dialects

JOST BERTHOLD; HANS-WOLFGANG LOIDL; KEVIN HAMMOND

doi:10.1017/S0956796816000010

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Part of: JFP Research Articles

Published online by Cambridge University Press: 13 July 2016

JOST BERTHOLD ,

HANS-WOLFGANG LOIDL and

KEVIN HAMMOND

Show author details

JOST BERTHOLD*: Affiliation:
Dept. of Computer Science (DIKU), University of Copenhagen, Commonwealth Bank of Australia, Sydney (e-mail: jberthold@acm.org)
HANS-WOLFGANG LOIDL: Affiliation:
School of Mathematical and Computer Sciences, Heriot-Watt University (e-mail: hwloidl@macs.hw.ac.uk)
KEVIN HAMMOND: Affiliation:
School of Computer Science, University of St.Andrews (e-mail: kevin@kevinhammond.net)
*: *Corresponding author. Reported work performed while at the University of Copenhagen (DIKU).

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Over time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Type: Articles
Information: Journal of Functional Programming , Volume 26 , 2016 , e10

DOI: https://doi.org/10.1017/S0956796816000010 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

References

Acar, U. A., Charguéraud, A. & Rainey, M. (2012 January) Efficient primitives for creating and scheduling parallel computations. In Workshop contribution for DAMP'12. Available at http://chargueraud.org/research/2012/damp/damp2012_primitives.pdf. [Retrieved 14/12/2015]Google Scholar

Acar, U. A., Charguéraud, A. & Rainey, M. (2013) Scheduling parallel programs by work stealing with private deques. ACM SIGPLAN Notices, vol. 48. (18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP). ACM, New York, pp. 219–228.Google Scholar

Aditya, S., Arvind, Augustsson L., Maessen, J.-W. & Nikhil, R. S. (1995) Semantics of pH: A parallel dialect of Haskell. In Proceedings of the Haskell Workshop, Hudak, P. (ed), La Jolla, USA, pp. 35–49.Google Scholar

Al Zain, A. D., Trinder, P. W., Michaelson, G. J. & Loidl, H.-W. (2008) Evaluating a high-level parallel language (GpH) for computational GRIDs. IEEE Trans. Parallel and Distrib. Syst. 19 (2), 219–233.Google Scholar

Aljabri, M., Trinder, P. & Loidl, H.-W. (2012 August) The design of a GUMSMP: A multilevel parallel Haskell implementation. In Proceedings of IFL'12: 24th Symposium on Implementation and Application of Functional Languages. Oxford, UK (draft proceedings), pp. 75–90.Google Scholar

Aljabri, M., Loidl, H.-W. & Trinder, P. W. (2013) The design and implementation of GUMSMP: A multilevel parallel Haskell implementation. In Proceedings of ACM SIGPLAN Symposium on Implementation and Application of Functional Languages (IFL'13). ACM, New York, pp. 37–48.Google Scholar

Aljabri, M., Loidl, H.-W. & Trinder, P. (2014) Balancing shared and distributed heaps on NUMA architectures. In Proceedings of TFP'14: Symposium on Trends in Functional Programming. LNCS 8843. Springer, Berlin/Heidelberg, pp. 1–17.Google Scholar

Aljabri, M. S. (2015 October) GUMSMP: A Scalable Parallel Haskell Implementation. Ph.D. thesis, School of Computing Science, University of Glasgow.Google Scholar

Appel, A. W. (1989) Simple generational garbage collection and fast allocation. Software: Pract. Exp. 19 (2), 171–183.Google Scholar

Aswad, M. K. (2012 April) Architecture Aware Parallel Programming in Glasgow Parallel Haskell. Ph.D. thesis, School of Mathematical and Computer Sciences, Heriot-Watt University.Google Scholar

Berthold, J. (2008 June) Explicit and Implicit Parallel Functional Programming: Concepts and Implementation. PhD Thesis, Philipps-Universität Marburg, Germany.Google Scholar

Berthold, J. (2011) Orthogonal serialisation for Haskell. In IFL'10: Implementation and Application of Functional Languages, Hage, J. & Morazan, M. (eds), LNCS, vol. 6647. Springer, Berlin/Heidelberg, pp. 38–53.Google Scholar

Berthold, J. & Loogen, R. (2005) Skeletons for recursively unfolding process topologies. In Proceedings of ParCo 2005, Joubert, G. R., Nagel, W. E., Peters, F. J., Plata, O. G., Tirado, P. & Zapata, E. L. (eds), John von Neumann Institute of Computing Series, vol. 33. Central Institute for Applied Mathematics, Jülich, Germany, pp. 835–843.Google Scholar

Berthold, J. & Loogen, R. (2007) Parallel coordination made explicit in a functional setting. In IFL'06: Implementation and Application of Functional Languages, Horváth, Z. & Zsók, V. (eds), LNCS, vol. 4449. Springer, Berlin/Heidelberg, pp. 73–90.Google Scholar

Berthold, J., Klusik, U., Loogen, R., Priebe, S. & Weskamp, N. (2003) High-level Process Control in Eden. In EuroPar 2003 – Parallel Processing, Kosch, H., Böszörményi, L. & Hellwagner, H. (eds), LNCS, vol. 2790. Springer, Berlin/Heidelberg, pp. 732–741.Google Scholar

Berthold, J., Loidl, H.-W. & Al Zain, A. D. (2008) Scheduling light-weight parallelism in ArTCoP. In PADL'08 — Practical Aspects of Declarative Languages, Hudak, P. & Warren, D. (eds), LNCS, vol. 4902. Springer, Berlin/Heidelberg, pp. 214–229.Google Scholar

Bevan, D. I. (1987) Distributed garbage collection using reference counting. In PARLE'87 — Parallel Architectures and Languages Europe, LNCS, vol. 259. Springer, Berlin/Heidelberg, pp. 176–187.Google Scholar

Breitinger, S., Klusik, U., Loogen, R., Ortega Mallén, Y. & Peña Marí, R. (1997) DREAM — the DistRibuted Eden abstract machine. In IFL'97: 9th International Workshop on the Implementation of Functional Languages, LNCS, vol. 1467. Springer, Berlin/Heidelberg, pp. 250–269.Google Scholar

Breitinger, S., Klusik, U. & Loogen, R. (1998) From (Sequential) Haskell to (Parallel) Eden: An implementation point of view. In Proceedings of the 10th International Symposium on Principles of Declarative Programming, LNCS, vol. 1490. Springer, Berlin/Heidelberg, pp. 318–334.Google Scholar

Cejtin, H., Jagannathan, S. & Kelsey, R. (1995) Higher-order distributed objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 17 (5), 704–739.Google Scholar

Chakravarty, M. M. T., Leshchinskiy, R., Peyton, Jones S., Keller, G. & Marlow, S. (2007) Data parallel Haskell: A status report. In DAMP'07, Workshop on Declarative Aspects of Multicore Programming. ACM, New York, pp. 10–18.Google Scholar

Cole, M. I. (1989) Algorithmic Skeletons: Structured Management of Parallel Computation, Research Monographs in Parallel and Distributed Computing. Cambridge(MA), USA: MIT Press.Google Scholar

Dieterle, M., Horstmeyer, T. & Loogen, R. (2010) Skeleton composition using remote data. In PADL 2010: Practical Aspects of Declarative Languages, LNCS, vol. 5937. Springer, Berlin/Heidelberg, pp. 73–87.Google Scholar

Dieterle, M., Horstmeyer, T., Berthold, J. & Loogen, R. (2013) Iterating Skeletons – structured parallelism by composition. In IFL'12: 24th Symposium on Implementation and Application of Functional Languages, Hinze, R. & Gill, A. (eds), LNCS, vol. 8241. Springer, Berlin/Heidelberg, pp. 18–36.CrossRef Google Scholar

Du Bois, A. R., Loidl, H.-W. & Trinder, P. W. (2002) Thread migration in a parallel graph reducer. In IFL'02: International Workshop on the Implementation of Functional Languages, LNCS, vol. 2670. Springer, Berlin/Heidelberg, pp. 199–214.Google Scholar

Epstein, J., Black, A. P. & Peyton-Jones, S. (2011) Towards Haskell in the cloud. In Proceedings of the 4th ACM Symposium on Haskell (Haskell'11). ACM, New York, pp. 118–129.Google Scholar

Fluet, M., Rainey, M., Reppy, J., Shaw, A. & Xiao, Y. (2007) Manticore: A heterogeneous parallel language. In DAMP 2007: Workshop on Declarative Aspects of Multicore Programming. ACM, New York, pp. 37–44.Google Scholar

Fluet, M., Rainey, M., Reppy, J. & Shaw, A. (2010) Implicitly threaded parallelism in Manticore. J. Funct. Programm. 20 (5–6), 537–576.Google Scholar

Foltzer, A., Kulkarni, A., Swords, R., Sasidharan, S., Jiang, E. & Newton, R. (2012) A Meta-scheduler for the Par-monad: Composable scheduling for the heterogeneous cloud. In ICFP'12: 17th ACM SIGPLAN International Conference on Functional Programming. ACM, New York, pp. 235–246.Google Scholar

Geist, Al. (2011) Parallel virtual machine. In Encyclopedia of Parallel Computing, Padua, D. (ed), Heidelberg/Berlin: Springer, pp. 1647–1651.Google Scholar

Gray, J. (1985) Why Do Computers Stop and What Can Be Done About It? Tandem Computers, Technical Report 85.7.Google Scholar

Hallgren, T., Jones, M. P., Leslie, R. & Tolmach, A. (2005) A principled approach to operating system construction in Haskell. In ICFP'05: 10th ACM SIGPLAN International Conference on Functional Programming, Danvy, O. & Pierce, B. C. (eds), ACM, New York, pp. 116–128.Google Scholar

Hammond, K. (1993 September) Getting a GRIP. IFL'93: International Workshop on the Parallel Implementation of Functional Languages. Nijmegen, the Netherlands (draft proceedings).Google Scholar

Hammond, K. (2011) Glasgow parallel Haskell (GpH). In Encyclopedia of Parallel Computing, Padua, D. (ed), Heidelberg/Berlin: Springer, pp. 768–779.Google Scholar

Hammond, K. & Peyton, Jones S. L. (1990) Some early experiments on the GRIP parallel reducer. In IFL'90: International Workshop on the Parallel Implementation of Functional Languages. TR 90-16, Department of Informatics, University of Nijmegen, pp. 51–72.Google Scholar

Hammond, K. & Peyton, Jones S. L. (1992 September) Profiling scheduling strategies on the GRIP multiprocessor. In IFL'92: International. Workshop on the Parallel Implementation of Functional Languages. vol. 92–19, Aachener Informatik-Berichte, pp. 73–98.Google Scholar

Klusik, U., Ortega-Mallén, Y. & Peña, Marí R. (1999) Implementing Eden – or: Dreams become reality. In IFL'98: 10th International Workshop on the Implementation of Functional Languages, LNCS, vol. 1595. Springer, Bertln/Heidelberg, pp. 103–119.Google Scholar

Lameter, C. (2013) NUMA (non-uniform memory access): An overview. Acm Queue 11 (7), 40:40–40:51.Google Scholar

Li, P., Marlow, S., Peyton Jones, S. & Tolmach, A. (2007) Lightweight concurrency primitives for GHC. In ACM SIGPLAN Workshop on Haskell (Haskell'07). ACM, New York, pp. 107–118.Google Scholar

Loidl, H.-W. (1998 (March) Granularity in Large-Scale Parallel Functional Programming. PhD Thesis, Dept. of Computing Science, Univ. of Glasgow.Google Scholar

Loidl, H.-W. (2001) Load balancing in a parallel graph reducer. In SFP'01 — Scottish Functional Programming Workshop, Hammond, K. & Curtis, S. (eds), Trends in Functional Programming, vol. 3. Intellect, Bristol, pp. 63–74.Google Scholar

Loidl, H.-W. & Hammond, K. (1994 September) GRAPH for PVM: Graph reduction for distributed hardware. In IFL'94: International Workshop on the Implementation of Functional Languages. Norwich, England (draft proceedings).Google Scholar

Loidl, H.-W. & Hammond, K. (1996) Making a packet: Cost-effective communication for a parallel graph reducer. In IFL'96: International Workshop on the Implementation of Functional Languages, LNCS, vol. 1268. Springer, Bertln/Heidelberg, pp. 184–199.Google Scholar

Loogen, R., Ortega-Mallén, Y. & Peña-Marí, R. (2005) Parallel functional programming in Eden. J. Funct. Programm. 15 (3), 431–475.Google Scholar

Maier, P. & Trinder, P. (2012) Implementing a high-level distributed-memory parallel Haskell in Haskell. In IFL'12: 24th Symposium on Implementation and Application of Functional Languages, Gill, A. & Hage, J. (eds) LNCS 7257. Springer, Bertln/Heidelberg, pp. 35–50.Google Scholar

Maier, P., Livesey, D., Loidl, H.-W. & Trinder, P. (2014a) High-performance computer algebra — a parallel Hecke algebra case study. In EuroPar'14: Parallel Processing, LNCS, vol. 8632. Springer, Bertln/Heidelberg, pp. 415–426.Google Scholar

Maier, P., Stewart, R. & Trinder, P. (2014b) The HdpH DSLs for scalable reliable computation. In Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell (Haskell'14). ACM, New York, pp. 65–76.Google Scholar

Marlow, S. & Peyton, Jones S. (2011) Multicore garbage collection with local heaps. In ISMM '11: Proceedings of the 10th International Symposium on Memory Management. ACM, New York, pp. 21–32.Google Scholar

Marlow, S., Peyton, Jones S. & Singh, S. (2009) Runtime support for multicore Haskell. In ICFP'09: 14th ACM SIGPLAN International Conference on Functional Programming. ACM, New York, pp. 65–78.Google Scholar

Marlow, S., Maier, P., Loidl, H.-W., Aswad, M. K. & Trinder, P. (2010) Seq no More: Better Strategies for Parallel Haskell. In Proceedings of the Third ACM Haskell Symposium (Haskell'10). ACM, New York, pp. 91–102.Google Scholar

Marlow, S., Newton, R. & Peyton, Jones S. (2011) A monad for deterministic parallelism. In Proceedings of the 4th ACM Haskell Symposium (Haskell'11). ACM, New York, pp. 71–82.Google Scholar

Mohr, E., Kranz, D. A. & Halstead, R. H. Jr. (1991) Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2 (3), 264–280.Google Scholar

MPI Forum (ed). (2012) MPI: A Message-Passing Interface Standard, Version 3.0. High Performance Computing Center Stuttgart (HLRS). Available at: http://www.mpi-forum.org/docs/. [Retrieved 14/12/2015]Google Scholar

Peyton, Jones S., Clack, C., Salkild, J. & Hardie, M. (1987) GRIP — a high-performance architecture for parallel graph reduction. In Intl. Conf. on Functional Programming Languages and Computer Architecture (FPCA'87), LNCS, vol. 274. Springer, Bertln/Heidelberg, pp. 98–112.Google Scholar

Reppy, J., Russo, C. & Xiao, Y. (2009) Parallel concurrent ML. In ICFP'09: 14th ACM SIGPLAN International Conference on Functional Programming. ACM, New York, pp. 257–268.Google Scholar

Sivaramakrishnan, K. C., Harris, T., Marlow, S. & Peyton, Jones S. (2013) Composable Scheduler Activations for Haskell. Technical Report, Microsoft Research, Cambridge.Google Scholar

Stewart, R., Trinder, P. & Maier, P. (2012) Supervised workpools for reliable massively parallel computing. In TFP12: International Symposium on Trends in Functional Programming, LNCS, vol. 7829. Springer, Bertln/Heidelberg, pp. 247–262.Google Scholar

Totoo, P. & Loidl, H.-W. (2014) Parallel Haskell implementations of the N-body problem. Concurrency Comput.: Pract. Exp. 26 (4), 987–1019.Google Scholar

Trinder, P. W., Hammond, K., Loidl, H.-W. & Peyton, Jones S. (1998) Algorithm + Strategy = Parallelism. J. Funct. Programm. 8 (1), 23–60.Google Scholar

Trinder, P. W., Hammond, K., Mattson, J. S. Jr., Partridge, A. S. & Peyton, Jones S. L. (1995) GUM: A portable parallel implementation of Haskell. In IFL'95: 7th International Workshop on the Implementation of Functional Languages. Båstad, Sweden (draft proceedings).Google Scholar

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Abstract

References

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests