Skip to main content

Accelerating Nested Data Parallelism: Preserving Regularity

  • Conference paper
  • First Online:
Euro-Par 2020: Parallel Processing (Euro-Par 2020)

Abstract

Irregular nested data-parallelism is a powerful programming model which enables the expression of a large class of parallel algorithms. However, it is notoriously difficult to compile such programs to efficient code for modern parallel architectures. Regular data-parallelism, on the other hand, is much easier to compile to efficient code, but too restricted to express some problems conveniently or in a manner to exploit the full parallelism. We extend the regular data-parallel programming model to allow for the parallel execution of array-level conditionals and iterations over irregular nested structures, and present two novel static analyses to optimise the code generated for these programs which reduces the costs of this more powerful irregular model. We present benchmarks to support our claim that these extensions are effective as well as feasible, as they enable to exploit the full parallelism of an important class of algorithms, and together with our optimisations lead to an improvement in absolute performance over an implementation limited to exploiting only regular parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/AccelerateHS/accelerate-examples/tree/master/examples/quicksort.

References

  1. Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., Shaw, A.: Data-only flattening for nested data parallelism. In: Principles and Practice of Parallel Programming (2013)

    Google Scholar 

  2. Blelloch, G.E., Sabot, G.W.: Compiling collection-oriented languages onto massively parallel computers. Parallel Distrib. Comput. 8(2), 119–134 (1990)

    Article  Google Scholar 

  3. Chakravarty, M.M.T., Keller, G., Lee, S., McDonell, T.L., Grover, V.: Accelerating haskell array codes with multicore GPUs. In: Declarative Aspects of Multicore Programming (2011)

    Google Scholar 

  4. Chatterjee, S., Blelloch, G.E., Zagha, M.: Scan primitives for vector computers. In: Supercomputing (1990)

    Google Scholar 

  5. Clifton-Everest, R., McDonell, T.L., Chakravarty, M.M.T., Keller, G.: Embedding foreign code. In: Practical Aspects of Declarative Languages (2014)

    Google Scholar 

  6. Clifton-Everest, R., McDonell, T.L., Chakravarty, M.M.T., Keller, G.: Streaming irregular arrays. In: Haskell (2017)

    Google Scholar 

  7. Elsman, M., Henriksen, T., Serup, N.G.W.: Data-parallel flattening by expansion. In: Libraries, Languages and Compilers for Array Programming (2019)

    Google Scholar 

  8. Elsman, M., Larsen, K.F.: Efficient translation of certain irregular data-parallel array comprehensions. In: Trends in Functional Programming (2020)

    Google Scholar 

  9. Fluet, M., Rainey, M., Reppy, J., Shaw, A., Xiao, Y.: Manticore: a heterogeneous parallel language. In: Declarative Aspects of Multicore Programming (2007)

    Google Scholar 

  10. Grelck, C.: Single assignment C (SAC): high productivity meets high performance. In: Central European Functional Programming School (2012)

    Google Scholar 

  11. van den Haak, L., McDonell, T., Keller, G., de Wolff, I.G.: Artifact for Euro-Par 2020 paper Accelerating Nested Data Parallelism: Preserving Regularity (July 2020). https://doi.org/10.6084/m9.figshare.12555275. https://springernature.figshare.com/articles/software/Artifact_for_Euro-Par_2020_paper_Accelerating_Nested_Data_Parallelism_Preserving_Regularity/12555275/1

  12. Henriksen, T., Elsman, M., Oancea, C.E.: Size slicing: a hybrid approach to size inference in futhark. In: Functional High-Performance Computing (2014)

    Google Scholar 

  13. Henriksen, T., Serup, N.G.W., Elsman, M., Henglein, F., Oancea, C.E.: Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In: Programming Language Design and Implementation (2017)

    Google Scholar 

  14. Henriksen, T., Thorøe, F., Elsman, M., Oancea, C.: Incremental flattening for nested data parallelism. In: Principles and Practice of Parallel Programming (2019)

    Google Scholar 

  15. Hoare, C.A.: Quicksort. Comput. J. 5(1), 10–16 (1962)

    Article  MathSciNet  Google Scholar 

  16. Keller, G., Chakravarty, M.M.T., Leshchinskiy, R., Lippmeier, B., Peyton Jones, S.: Vectorisation avoidance. In: Haskell (2012)

    Google Scholar 

  17. McDonell, T.L., Chakravarty, M.M.T., Grover, V., Newton, R.R.: Type-safe runtime code generation: accelerate to LLVM. In: Haskell (2015)

    Google Scholar 

  18. McDonell, T.L., Chakravarty, M.M.T., Keller, G., Lippmeier, B.: Optimising purely functional GPU programs. In: International Conference on Functional Programming (2013)

    Google Scholar 

  19. Peyton Jones, S., Leshchinskiy, R., Keller, G., Chakravarty, M.M.T.: Harnessing the multicores: nested data parallelism in Haskell. In: Foundations of Software Technology and Theoretical Computer Science (2008)

    Google Scholar 

  20. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optizing parallelism, locality, and recomputation in image processing pipelines. In: Programming Language Design and Implementation (2013)

    Google Scholar 

  21. Reppy, J., Sandler, N.: Nessie: a NESL to CUDA compiler. In: Compilers for Parallel Computing (2015)

    Google Scholar 

  22. Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Graphics Hardware (2007)

    Google Scholar 

  23. Steuwer, M., Remmelg, T., Dubach, C.: Lift: a functional data-parallel IR for high-performance GPU code generation. In: Code Generation and Optimization (2017)

    Google Scholar 

  24. Svensson, B.J., Sheeran, M., Claessen, K.: Obsidian: a domain specific embedded language for parallel programming of graphics processors. In: Implementation and Application of Functional Languages, pp. 156–173 (January 2008)

    Google Scholar 

Download references

Acknowledgements and Data Availability Statement

We would like to thank the reviewers for their detailed feedback and suggestions also with respect to possible future work directions, and Manuel Chakravarty for his comments on an earlier version of this paper.

The datasets and code generated during and/or analysed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.12555275 [11].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars B. van den Haak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

van den Haak, L.B., McDonell, T.L., Keller, G.K., de Wolff, I.G. (2020). Accelerating Nested Data Parallelism: Preserving Regularity. In: Malawski, M., Rzadca, K. (eds) Euro-Par 2020: Parallel Processing. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12247. Springer, Cham. https://doi.org/10.1007/978-3-030-57675-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57675-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57674-5

  • Online ISBN: 978-3-030-57675-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics