skip to main content
10.1145/3314221.3314612acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Modular divide-and-conquer parallelization of nested loops

Published:08 June 2019Publication History

ABSTRACT

We propose a methodology for automatic generation of divide-and-conquer parallel implementations of sequential nested loops. We focus on a class of loops that traverse read-only multidimensional collections (lists or arrays) and compute a function over these collections. Our approach is modular, in that, the inner loop nest is abstracted away to produce a simpler loop nest for parallelization. The summarized version of the loop nest is then parallelized. The main challenge addressed by this paper is that to perform the code transformations necessary in each step, the loop nest may have to be augmented (automatically) with extra computation to make possible the abstraction and/or the parallelization tasks. We present theoretical results to justify the correctness of our modular approach, and algorithmic solutions for automation. Experimental results demonstrate that our approach can parallelize highly non-trivial loop nests efficiently.

Skip Supplemental Material Section

Supplemental Material

p610-farzan.webm

webm

60.4 MB

References

  1. Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, Mukund Raghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design 2013 (FMCAD’ 13). IEEE, 1–8.Google ScholarGoogle Scholar
  2. David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-performance Computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cedric Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT ’04). IEEE Computer Society, 7–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yosi Ben-Asher and Gadi Haber. 2001. Parallel Solutions of Simple Indexed Recurrence Equations. IEEE Trans. Parallel Distrib. Syst. 12, 1 (Jan. 2001), 22–37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guy E Blelloch. 1993. Prefix sums and their applications. In Synthesis of Parallel Algorithms (1st ed.). Morgan Kaufmann Publishers Inc.Google ScholarGoogle Scholar
  6. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proceedings of Symposium on Principles and Practice of Parallel Programming, PPOPP 2012. 181–192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. William Blume, Ramon Doallo, Rudolf Eigenmann, John Grout, Jay Hoeflinger, Thomas Lawrence, Jaejin Lee, David Padua, Yunheung Paek, Bill Pottenger, Lawrence Rauchwerger, and Peng Tu. 1996. Parallel Programming with Polaris. Computer 29, 12 (Dec. 1996), 78–82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gilberto Contreras and Margaret Martonosi. 2008. Characterizing and improving the performance of Intel Threading Building Blocks. In 4th International Symposium on Workload Characterization, 2008. 57–66.Google ScholarGoogle ScholarCross RefCross Ref
  9. Daniel Cordes, Heiko Falk, and Peter Marwedel. 2009. A fast and precise static loop analysis based on abstract interpretation, program slicing and polytope models. In CGO 2009. IEEE, 136–146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5, 1 (1998), 46–55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Azadeh Farzan and Victor Nicolet. 2017. Synthesis of Divide and Conquer Parallelism for Loops. In Proceedings of the 38th ACM SIG-PLAN Conference on Programming Language Design and Implementation (PLDI 2017). 540–555.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Azadeh Farzan and Victor Nicolet. 2019. Modular Synthesis of Divide-and-Conquer Parallelism for Nested Loops (Extended Version). arXiv: cs.PL/1904.01031Google ScholarGoogle Scholar
  13. Azadeh Farzan and Victor Nicolet. 2019. Parsynt. http://www.cs. toronto.edu/~victorn/parsynt/index.htmlGoogle ScholarGoogle Scholar
  14. Grigory Fedyukovich, Maaz Bin Safeer Ahmad, and Rastislav Bodik. 2017. Gradual Synthesis for Static Parallelization of Single-pass Arrayprocessing Programs. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). 572–585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Allan L. Fisher and Anwar M. Ghuloum. 1994. Parallelizing Complex Scans and Reductions. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). 135–146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alfons Geser and Sergei Gorlatch. 1997. Parallelizing Functional Programs by Generalization. In Proceedings of the 6th International Joint Conference on Algebraic and Logic Programming (ALP ’97-HOA ’97). 46–60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jeremy Gibbons. 1996. The Third Homomorphism Theorem. J. Funct. Program. 6, 4 (1996), 657–665.Google ScholarGoogle ScholarCross RefCross Ref
  18. Sergei Gorlatch. 1996. Systematic Extraction and Implementation of Divide-and-Conquer Parallelism. In Proceedings of the 8th International Symposium on Programming Languages: Implementations, Logics, and Programs (PLILP ’96). 274–288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sergei Gorlatch. 1999. Extracting and Implementing List Homomorphisms in Parallel Program Development. Sci. Comput. Program. 33, 1 (Jan. 1999), 1–27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jan Gustafsson, Andreas Ermedahl, Christer Sandberg, and Bjorn Lisper. 2006. Automatic derivation of loop bounds and infeasible paths for WCET analysis using abstract execution. In Real-Time Systems Symposium, 2006. RTSS’06. 27th IEEE International. IEEE, 57–66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hwansoo Han and Chau-Wen Tseng. 2001. A comparison of parallelization techniques for irregular reductions. In Parallel and Distributed Processing Symposium., Proceedings 15th International. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W Daniel Hillis and Guy L Steele Jr. 1986. Data parallel algorithms. Commun. ACM 29, 12 (1986), 1170–1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shachar Itzhaky, Rohit Singh, Armando Solar-Lezama, Kuat Yessenov, Yongquan Lu, Charles E. Leiserson, and Rezaul Alam Chowdhury. 2016. Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016. 145–164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando SolarLezama. 2016. Verified Lifting of Stencil Computations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). 711–726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Emanuel Kitzelmann and Ute Schmid. 2006. Inductive synthesis of functional programs: An explanation based generalization approach. Journal of Machine Learning Research 7, Feb (2006), 429–454.Google ScholarGoogle Scholar
  26. Richard E Ladner and Michael J Fischer. 1980. Parallel prefix computation. Journal of the ACM (JACM) 27, 4 (1980), 831–838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Rustan M. Leino. 2010. Dafny: An Automatic Program Verifier for Functional Correctness. In Proceedings of the 16th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR’10). Springer-Verlag, 348–370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Claude Marché and Xavier Urbain. 1998. Termination of associativecommutative rewriting by dependency pairs. In International Conference on Rewriting Techniques and Applications. Springer, 241–255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Akimasa Morihata and Kiminori Matsuzaki. 2010. Automatic Parallelization of Recursive Functions Using Quantifier Elimination. In Functional and Logic Programming, 10th International Symposium, FLOPS 2010, Sendai, Japan, April 19-21, 2010. Proceedings. 321–336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kazutaka Morita, Akimasa Morihata, Kiminori Matsuzaki, Zhenjiang Hu, and Masato Takeichi. 2007. Automatic Inversion Generates Divideand-conquer Parallel Programs. In Proceedings of the 28th ACM SIG-PLAN Conference on Programming Language Design and Implementation (PLDI ’07). 146–155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Paliath Narendran and Michael Rusinowitch. 1991. Any ground associative-commutative theory has a finite canonical system. In International Conference on Rewriting Techniques and Applications. Springer, 423–434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with CUDA. In ACM SIGGRAPH 2008 classes. ACM, 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298–298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of Parallelism in Algorithms. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, 12–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Cosmin Radoi, Stephen J. Fink, Rodric Rabbah, and Manu Sridharan. 2014. Translating Imperative Code to MapReduce. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). 909–927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing User-defined Aggregations Using Symbolic Execution. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). 153–167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ron Shamir and Dekel Tsur. 1999. Faster subtree isomorphism. Journal of Algorithms 33, 2 (1999), 267–280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, 326–340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. YN Srikant and Priti Shankar. 2002. The compiler design handbook: optimizations and machine code generation. CRC Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nicolas Vasilache, Cédric Bastoul, and Albert Cohen. 2006. Polyhedral Code Generation in the Real World. In Proceedings of the 15th International Conference on Compiler Construction (CC’06). 185–201. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modular divide-and-conquer parallelization of nested loops

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2019
          1162 pages
          ISBN:9781450367127
          DOI:10.1145/3314221

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 June 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader