skip to main content
10.1145/1863543.1863558acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Lazy tree splitting

Published:27 September 2010Publication History

ABSTRACT

Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle.

Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.

Skip Supplemental Material Section

Supplemental Material

icfp-mon-1605-rainey.mov

mov

96.5 MB

References

  1. }}Appel, A. W. Simple generational garbage collection and fast allocation. SP&E, 19(2), 1989, pp. 171--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Appel, A. W. Compiling with Continuations. Cambridge University Press, Cambridge, England, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Boehm, H.-J., R. Atkinson, and M. Plass. Ropes: an alternative to strings. SP&E, 25(12), December 1995, pp. 1315--1330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Blelloch, G. E., S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. JPDC, 21(1), 1994, pp. 4--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Blelloch, G. E. and J. Greiner. A provable time and space efficient implementation of NESL. In ICFP '96. ACM, May 1996, pp. 213--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Barnes, J. and P. Hut. A hierarchical o(n log n) force calculation algorithm. Nature, 324, December 1986, pp. 446--449.Google ScholarGoogle ScholarCross RefCross Ref
  7. }}Blumofe, R. D. and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5), 1999, pp. 720--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}Blelloch, G. E. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.Google ScholarGoogle Scholar
  9. }}Blelloch, G. E. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Blelloch, G. E. Programming parallel algorithms. CACM, 39(3), March 1996, pp. 85--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}Burton, F. W. and M. R. Sleep. Executing functional programs on a virtual tree of processors. In FPCA '81. ACM, October 1981, pp. 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}Chatterjee, S. Compiling nested data-parallel programs for shared-memory multiprocessors. ACM TOPLAS, 15(3), July 1993, pp. 400--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In DAMP '07. ACM, January 2007, pp. 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial Vectorisation of Haskell Programs. In DAMP '08. ACM, January 2008.Google ScholarGoogle Scholar
  15. }}Fluet, M., N. Ford, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Status Report: The Manticore Project. In ML '07. ACM, October 2007, pp. 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Frigo, M., C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, June 1998, pp. 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}Fluet, M., M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: A heterogeneous parallel language. In DAMP '07. ACM, January 2007, pp. 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Fluet, M., M. Rainey, and J. Reppy. A scheduling framework for general-purpose parallel languages. In ICFP '08, Victoria, BC, Candada, September 2008. ACM, pp. 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}Fluet, M., M. Rainey, J. Reppy, and A. Shaw. Implicitly-threaded parallelism in Manticore. In ICFP '08, Victoria, BC, Candada, September 2008. ACM, pp. 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}Barnes Hut benchmark written in Haskell. Available from http://darcs.haskell.org/packages/ndp/examples/barnesHut/.Google ScholarGoogle Scholar
  21. }}Ghuloum, A., E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A flexible parallel programming model for tera-scale architectures. Technical report, Intel, October 2007. Available at http://techresearch.intel.com/UserFiles/en-us/File/terascale/Whitepaper-Ct.pdf.Google ScholarGoogle Scholar
  22. }}Halstead Jr., R. H. Implementation of multilisp: Lisp on a multiprocessor. In LFP '84. ACM, August 1984, pp. 9--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}Hinze, R. and R. Paterson. Finger trees: a simple general-purpose data structure. JFP, 16(2), 2006, pp. 197--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Huet, G. The zipper. JFP, 7(5), 1997, pp. 549--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. }}Intel Threading Building Blocks Reference Manual, 2008.Google ScholarGoogle Scholar
  26. }}Keller, G. Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 1999.Google ScholarGoogle Scholar
  27. }}Leiserson, C. E. The Cilk++concurrency platform. In DAC '09, San Francisco, California, 2009. ACM, pp. 522--527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}Leshchinskiy, R. Higher-Order Nested Data Parallelism: Semantics and Implementation. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 2005.Google ScholarGoogle Scholar
  29. }}Loidl, H. W. and K. Hammond. On the Granularity of Divide-and-Conquer Parallelism. In GWFP '95. Springer-Verlag, 1995, pp. 8--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}McBride, C. Clowns to the left of me, jokers to the right (pearl): dissecting data structures. In POPL '08. ACM, January 2008, pp. 287--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}MLton. The MLton Standard ML compiler. Available at http://mlton.org.Google ScholarGoogle Scholar
  32. }}Milner, R., M. Tofte, R. Harper, and D. MacQueen. The Definition of Standard ML (Revised). The MIT Press, Cambridge, MA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. }}Narlikar, G. J. and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM TOPLAS, 21(1), 1999, pp. 138--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. }}Nikhil, R. S. ID Language Reference Manual. Laboratory for Computer Science, MIT, Cambridge, MA, July 1991.Google ScholarGoogle Scholar
  35. }}Rainey, M. The Manticore runtime model. Master's dissertation, University of Chicago, January 2007. Available from http://manticore.cs.uchicago.edu.Google ScholarGoogle Scholar
  36. }}Rainey, M. Prototyping nested schedulers. In M. Felleisen, R. Findler, and M. Flatt (eds.), Semantics Engineering with PLT Redex. MIT Press, 2009.Google ScholarGoogle Scholar
  37. }}Robison, A., M. Voss, and A. Kukanov. Optimization via Reflection on Work Stealing in TBB. In IPDPS '08. IEEE Computer Society Press, 2008.Google ScholarGoogle Scholar
  38. }}Scandal Project. A library of parallel algorithms written NESL. Available from http://www.cs.cmu.edu/~scandal/nesl/algorithms.html.Google ScholarGoogle Scholar
  39. }}So, B., A. Ghuloum, and Y. Wu. Optimizing data parallel operations on many-core platforms. In STMCS '06, 2006.Google ScholarGoogle Scholar
  40. }}Tzannes, A., G. C. Caragea, R. Barua, and U. Vishkin. Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In PPoPP '10, Bangalore, India, January 2010. ACM, pp. 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. }}Trinder, P. W., K. Hammond, H.-W. Loidl, and S. L. Peyton Jones. Algorithm+strategy = parallelism. JFP, 8(1), January 1998, pp. 23--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. }}Tick, E. and X. Zhong. A compile-time granularity analysis algorithm and its performance evaluation. In FGCS '92, Tokyo, Japan, 1993. Springer-Verlag, pp. 271--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. }}Weeks, S. Whole program compilation in MLton. Invited talk at ML '06 Workshop, September 2006. Invited talk; slides available at http://mlton.org/pages/References/attachments/060916-mlton.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Lazy tree splitting

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICFP '10: Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
            September 2010
            398 pages
            ISBN:9781605587943
            DOI:10.1145/1863543
            • cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 45, Issue 9
              ICFP '10
              September 2010
              382 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/1932681
              Issue’s Table of Contents

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 September 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate333of1,064submissions,31%

            Upcoming Conference

            ICFP '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          ePub

          View this article in ePub.

          View ePub