Skip to main content

Polyhedral Optimizations for a Data-Flow Graph Language

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9519))

Abstract

This paper proposes a novel optimization framework for the Data-Flow Graph Language (DFGL), a dependence-based notation for macro-dataflow model which can be used as an embedded domain-specific language. Our optimization framework follows a “dependence-first” approach in capturing the semantics of DFGL programs in polyhedral representations, as opposed to the standard polyhedral approach of deriving dependences from access functions and schedules. As a first step, our proposed framework performs two important legality checks on an input DFGL program — checking for potential violations of the single-assignment rule, and checking for potential deadlocks. After these legality checks are performed, the DFGL dependence information is used in lieu of standard polyhedral dependences to enable polyhedral transformations and code generation, which include automatic loop transformations, tiling, and code generation of parallel loops with coarse-grain (fork-join) and fine-grain (doacross) synchronizations. Our performance experiments with nine benchmarks on Intel Xeon and IBM Power7 multicore processors show that the DFGL versions optimized by our proposed framework can deliver up to 6.9\(\times \) performance improvement relative to standard OpenMP versions of these benchmarks. To the best of our knowledge, this is the first system to encode explicit macro-dataflow parallelism in polyhedral representations so as to provide programmers with an easy-to-use DSL notation with legality checks, while taking full advantage of the optimization functionality in state-of-the-art polyhedral frameworks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Step I/O may comprise a list of items, and item keys may include range expressions.

  2. 2.

    A typical case is env step to create set of step instances where tag is a range.

  3. 3.

    In future work, we may consider the possibility of not treating this case as an error condition by assuming that each data item that is not performed in the DFGL region has a initializing write that is instead performed by the environment.

  4. 4.

    MKL is the best tuned library for Intel platforms. We compare against Sequential and Parallel MKL.

  5. 5.

    On POWER7 we use ATLAS — the sequential library — as MKL cannot run on POWER7, and a parallel library was not available.

References

  1. Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical report LLNL-TR-490254

    Google Scholar 

  2. The PACE compiler project. http://pace.rice.edu

  3. The Swarm Framework. http://swarmframework.org/

  4. Building an open community runtime (OCR) framework for exascale systems, supercomputing 2012 Birds-of-a-feather session, November 2012

    Google Scholar 

  5. Ackerman, W., Dennis, J.: VAL - A Value Oriented Algorithmic Language. Technical report TR-218, MIT Laboratory for Computer Science, June 1979

    Google Scholar 

  6. Agrawal, K., et al.: Executing task graphs using work-stealing. In: IPDPS (2010)

    Google Scholar 

  7. Arvind., Dertouzos, M., Nikhil, R., Papadopoulos, G.: Project Dataflow: A parallel computing system based on the Monsoon architecture and the Id programming language. Technical report, MIT Lab for Computer Science, computation Structures Group Memo 285, March 1988

    Google Scholar 

  8. Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT, pp. 7–16 (2004)

    Google Scholar 

  9. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC (2012)

    Google Scholar 

  10. Bhaskaracharya, S.G., Bondhugula, U.: PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. LNCS, vol. 7791, pp. 123–143. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI (2008)

    Google Scholar 

  12. Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., Taşirlar, S.: Concurrent collections. Sci. Program. 18, 203–217 (2010)

    Google Scholar 

  13. Chandramowlishwaran, A., Knobe, K., Vuduc, R.: Performance evaluation of concurrent collections on high-performance multicore computing systems. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12, April 2010

    Google Scholar 

  14. Chatarasi, P., Shirako, J., Sarkar, V.: Polyhedral optimizations of explicitly parallel programs. In: Proceedings of PACT 2015 (2015)

    Google Scholar 

  15. Chatterjee, S., Tasrlar, S., Budimlic, Z., Cave, V., Chabbi, M., Grossman, M., Sarkar, V., Yan, Y.: Integrating asynchronous task parallelism with MPI. In: IPDPS (2013)

    Google Scholar 

  16. Collard, J.-F., Griebl, M.: Array dataflow analysis for explicitly parallel programs. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 406–416. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  17. Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: ICPP 1986, pp. 836–844 (1986)

    Google Scholar 

  18. Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  19. Feautrier, P., Lengauer, C.: The polyhedron model. In: Encyclopedia of Parallel Programming (2011)

    Google Scholar 

  20. Hong, S., Salihoglu, S., Widom, J., Olukotun, K.: Simplifying scalable graph processing with a domain-specific language. In: CGO (2014)

    Google Scholar 

  21. IntelCorporation: Intel (R) Concurrent Collections for C/C++. http://softwarecommunity.intel.com/articles/eng/3862.htm

  22. Karlin, I., et al.: Lulesh programming model and performance ports overview. Techical report. LLNL-TR-608824, December 2012

    Google Scholar 

  23. Kong, M., Pop, A., Pouchet, L.N., Govindarajan, R., Cohen, A., Sadayappan, P.: Compiler/runtime framework for dynamic dataflow parallelization of tiled programs. ACM Trans. Archit. Code Optim. (TACO) 11(4), 61 (2015)

    Google Scholar 

  24. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978). http://doi.acm.org/10.1145/359545.359563

    Article  MATH  Google Scholar 

  25. Pouchet, L.-N.: The Polyhedral Benchmark Suite. http://polybench.sourceforge.net

  26. Lu, Q., Bondhugula, U., Henretty, T., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P., Chen, Y., Lin, H., Fook Ngai, T.: Data layout transformation for enhancing data locality on NUCA chip multiprocessors. In: PACT (2009)

    Google Scholar 

  27. McGraw, J.: SISAL - Streams and Iteration in a Single-Assignment Language - Version 1.0. Lawrence Livermore National Laboratory, July 1983

    Google Scholar 

  28. OpenMP Technical Report 3 on OpenMP 4.0 enhancements. http://openmp.org/TR3.pdf

  29. Sarkar, V., Harrod, W., Snavely, A.E.: Software Challenges in Extreme Scale Systems, special Issue on Advanced Computing: The Roadmap to Exascale, January 2010

    Google Scholar 

  30. Sarkar, V., Hennessy, J.: Partitioning parallel programs for macro-dataflow. In: ACM Conference on LISP and Functional Programming, pp. 202–211, August 1986

    Google Scholar 

  31. Sbirlea, A., Pouchet, L.N., Sarkar, V.: DFGR: an intermediate graph representation for macro-dataflow programs. In: Fourth International Workshop on Data-Flow Modelsfor Extreme Scale Computing (DFM 2014), August 2014

    Google Scholar 

  32. Sbîrlea, A., Zou, Y., Budimlić, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: LCTES (2012)

    Google Scholar 

  33. Shirako, J., Pouchet, L.N., Sarkar, V.: Oil and water can mix: an integration of polyhedral and AST-based transformations. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2014 (2014)

    Google Scholar 

  34. Shirako, J., Unnikrishnan, P., Chatterjee, S., Li, K., Sarkar, V.: Expressing DOACROSS loop dependencies in OpenMP. In: 9th International Workshop on OpenMP (IWOMP) (2011)

    Google Scholar 

  35. Stavrou, K., Nikolaides, M., Pavlou, D., Arandi, S., Evripidou, P., Trancoso, P.: TFlux: a portable platform for data-driven multithreading on commodity multicore systems. In: ICPP (2008)

    Google Scholar 

  36. The STE—AR Group: HPX, a C++ runtime system for parallel and distributed applications of any scale. http://stellar.cct.lsu.edu/tag/hpx

  37. UCLA, Rice, OSU, UCSB: Center for Domain-Specific Computing (CDSC). http://cdsc.ucla.edu

  38. Unnikrishnan, P., Shirako, J., Barton, K., Chatterjee, S., Silvera, R., Sarkar, V.: A practical approach to DOACROSS parallelization. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 219–231. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  39. Vrvilo, N.: Asynchronous Checkpoint/Restart for the Concurrent Collections Model. MS thesis, Rice University (2014). https://habanero.rice.edu/vrvilo-ms

  40. Wonnacott, D.G.: Constraint-based Array Dependence Analysis. Ph.D. thesis, College Park, MD, USA, uMI Order No. GAX96-22167 (1995)

    Google Scholar 

  41. Yuki, T., Feautrier, P., Rajopadhye, S., Saraswat, V.: Array dataflow analysis for polyhedral X10 programs. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2007 (2013)

    Google Scholar 

  42. Yuki, T., Gupta, G., Kim, D.G., Pathan, T., Rajopadhye, S.: AlphaZ: a system for design space exploration in the polyhedral model. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 17–31. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Science Foundation through awards 0926127 and 1321147.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alina Sbîrlea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Sbîrlea, A., Shirako, J., Pouchet, LN., Sarkar, V. (2016). Polyhedral Optimizations for a Data-Flow Graph Language. In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29778-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29777-4

  • Online ISBN: 978-3-319-29778-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics