research-article

Lazy tree splitting

Authors:
Lars Bergstrom

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Mike Rainey

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
John Reppy

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Adam Shaw

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Matthew Fluet

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

ICFP '10: Proceedings of the 15th ACM SIGPLAN international conference on Functional programmingSeptember 2010Pages 93–104https://doi.org/10.1145/1863543.1863558

Published:27 September 2010Publication History

ICFP '10: Proceedings of the 15th ACM SIGPLAN international conference on Functional programming

Pages 93–104

ABSTRACT

Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle.

Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.

Supplemental Material

icfp-mon-1605-rainey.mov

mov

96.5 MB

Download

References

}}Appel, A. W. Simple generational garbage collection and fast allocation. SP&E, 19(2), 1989, pp. 171--183. Google ScholarDigital Library
}}Appel, A. W. Compiling with Continuations. Cambridge University Press, Cambridge, England, 1992. Google ScholarDigital Library
}}Boehm, H.-J., R. Atkinson, and M. Plass. Ropes: an alternative to strings. SP&E, 25(12), December 1995, pp. 1315--1330. Google ScholarDigital Library
}}Blelloch, G. E., S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. JPDC, 21(1), 1994, pp. 4--14. Google ScholarDigital Library
}}Blelloch, G. E. and J. Greiner. A provable time and space efficient implementation of NESL. In ICFP '96. ACM, May 1996, pp. 213--225. Google ScholarDigital Library
}}Barnes, J. and P. Hut. A hierarchical o(n log n) force calculation algorithm. Nature, 324, December 1986, pp. 446--449.Google ScholarCross Ref
}}Blumofe, R. D. and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5), 1999, pp. 720--748. Google ScholarDigital Library
}}Blelloch, G. E. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.Google Scholar
}}Blelloch, G. E. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA, 1990. Google ScholarDigital Library
}}Blelloch, G. E. Programming parallel algorithms. CACM, 39(3), March 1996, pp. 85--97. Google ScholarDigital Library
}}Burton, F. W. and M. R. Sleep. Executing functional programs on a virtual tree of processors. In FPCA '81. ACM, October 1981, pp. 187--194. Google ScholarDigital Library
}}Chatterjee, S. Compiling nested data-parallel programs for shared-memory multiprocessors. ACM TOPLAS, 15(3), July 1993, pp. 400--462. Google ScholarDigital Library
}}Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In DAMP '07. ACM, January 2007, pp. 10--18. Google ScholarDigital Library
}}Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial Vectorisation of Haskell Programs. In DAMP '08. ACM, January 2008.Google Scholar
}}Fluet, M., N. Ford, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Status Report: The Manticore Project. In ML '07. ACM, October 2007, pp. 15--24. Google ScholarDigital Library
}}Frigo, M., C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, June 1998, pp. 212--223. Google ScholarDigital Library
}}Fluet, M., M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: A heterogeneous parallel language. In DAMP '07. ACM, January 2007, pp. 37--44. Google ScholarDigital Library
}}Fluet, M., M. Rainey, and J. Reppy. A scheduling framework for general-purpose parallel languages. In ICFP '08, Victoria, BC, Candada, September 2008. ACM, pp. 241--252. Google ScholarDigital Library
}}Fluet, M., M. Rainey, J. Reppy, and A. Shaw. Implicitly-threaded parallelism in Manticore. In ICFP '08, Victoria, BC, Candada, September 2008. ACM, pp. 119--130. Google ScholarDigital Library
}}Barnes Hut benchmark written in Haskell. Available from http://darcs.haskell.org/packages/ndp/examples/barnesHut/.Google Scholar
}}Ghuloum, A., E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A flexible parallel programming model for tera-scale architectures. Technical report, Intel, October 2007. Available at http://techresearch.intel.com/UserFiles/en-us/File/terascale/Whitepaper-Ct.pdf.Google Scholar
}}Halstead Jr., R. H. Implementation of multilisp: Lisp on a multiprocessor. In LFP '84. ACM, August 1984, pp. 9--17. Google ScholarDigital Library
}}Hinze, R. and R. Paterson. Finger trees: a simple general-purpose data structure. JFP, 16(2), 2006, pp. 197--217. Google ScholarDigital Library
}}Huet, G. The zipper. JFP, 7(5), 1997, pp. 549--554. Google ScholarDigital Library
}}Intel Threading Building Blocks Reference Manual, 2008.Google Scholar
}}Keller, G. Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 1999.Google Scholar
}}Leiserson, C. E. The Cilk++concurrency platform. In DAC '09, San Francisco, California, 2009. ACM, pp. 522--527. Google ScholarDigital Library
}}Leshchinskiy, R. Higher-Order Nested Data Parallelism: Semantics and Implementation. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 2005.Google Scholar
}}Loidl, H. W. and K. Hammond. On the Granularity of Divide-and-Conquer Parallelism. In GWFP '95. Springer-Verlag, 1995, pp. 8--10. Google ScholarDigital Library
}}McBride, C. Clowns to the left of me, jokers to the right (pearl): dissecting data structures. In POPL '08. ACM, January 2008, pp. 287--295. Google ScholarDigital Library
}}MLton. The MLton Standard ML compiler. Available at http://mlton.org.Google Scholar
}}Milner, R., M. Tofte, R. Harper, and D. MacQueen. The Definition of Standard ML (Revised). The MIT Press, Cambridge, MA, 1997. Google ScholarDigital Library
}}Narlikar, G. J. and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM TOPLAS, 21(1), 1999, pp. 138--173. Google ScholarDigital Library
}}Nikhil, R. S. ID Language Reference Manual. Laboratory for Computer Science, MIT, Cambridge, MA, July 1991.Google Scholar
}}Rainey, M. The Manticore runtime model. Master's dissertation, University of Chicago, January 2007. Available from http://manticore.cs.uchicago.edu.Google Scholar
}}Rainey, M. Prototyping nested schedulers. In M. Felleisen, R. Findler, and M. Flatt (eds.), Semantics Engineering with PLT Redex. MIT Press, 2009.Google Scholar
}}Robison, A., M. Voss, and A. Kukanov. Optimization via Reflection on Work Stealing in TBB. In IPDPS '08. IEEE Computer Society Press, 2008.Google Scholar
}}Scandal Project. A library of parallel algorithms written NESL. Available from http://www.cs.cmu.edu/~scandal/nesl/algorithms.html.Google Scholar
}}So, B., A. Ghuloum, and Y. Wu. Optimizing data parallel operations on many-core platforms. In STMCS '06, 2006.Google Scholar
}}Tzannes, A., G. C. Caragea, R. Barua, and U. Vishkin. Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In PPoPP '10, Bangalore, India, January 2010. ACM, pp. 179--190. Google ScholarDigital Library
}}Trinder, P. W., K. Hammond, H.-W. Loidl, and S. L. Peyton Jones. Algorithm+strategy = parallelism. JFP, 8(1), January 1998, pp. 23--60. Google ScholarDigital Library
}}Tick, E. and X. Zhong. A compile-time granularity analysis algorithm and its performance evaluation. In FGCS '92, Tokyo, Japan, 1993. Springer-Verlag, pp. 271--295. Google ScholarDigital Library
}}Weeks, S. Whole program compilation in MLton. Invited talk at ML '06 Workshop, September 2006. Invited talk; slides available at http://mlton.org/pages/References/attachments/060916-mlton.pdf. Google ScholarDigital Library

Index Terms

Lazy tree splitting
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

Lazy tree splitting
ICFP '10

Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel ...
Read More
A scheduling framework for general-purpose parallel languages
ICFP '08

The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable ...
Read More
Lazy tree splitting

Nested data-parallelism (NDP) is a language mechanism that supports programming irregular parallel applications in a declarative style. In this paper, we describe the implementation of NDP in Parallel ML (PML), which is a part of the Manticore system. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICFP '10: Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
September 2010
398 pages
ISBN:9781605587943
DOI:10.1145/1863543
General Chair:
Paul Hudak
Yale University, USA
,
Program Chair:
Stephanie Weirich
University of Pennsylvania, USA
ACM SIGPLAN Notices Volume 45, Issue 9
ICFP '10
September 2010
382 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1932681
Issue’s Table of Contents
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 September 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compilers
nested-data-parallel languages
run-time systems
scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate333of1,064submissions,31%
Upcoming Conference
ICFP '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Functional Programming

September 9 - 13, 2024

Milan , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 384
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

Lazy tree splitting

ICFP '10: Proceedings of the 15th ACM SIGPLAN international conference on Functional programming

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Lazy tree splitting

A scheduling framework for general-purpose parallel languages

Lazy tree splitting