Skip to main content

QUARC: An Array Programming Approach to High Performance Computing

  • Conference paper
  • First Online:
Book cover Languages and Compilers for Parallel Computing (LCPC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

  • 992 Accesses

Abstract

We present QUARC, a framework for the optimized compilation of domain-specific extensions to C++. Driven by needs for programmer productivity and portable performance for lattice QCD, the framework focuses on stencil-like computations on arrays with an arbitrary number of dimensions. QUARC uses a template meta-programming front end to define a high-level array language. Unlike approaches that generate scalarized loop nests in the front end, the instantiation of QUARC templates retains high-level abstraction suitable for optimization at the object (array) level. The back end compiler (CLANG/LLVM) is extended to implement array transformations such as transposition, reshaping, and partitioning for parallelism and for memory locality prior to scalarization. We present the design and implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 4:1–4:12. IEEE Press, Piscataway (2008). http://dl.acm.org/citation.cfm?id=1413370.1413375

  2. Edwards, H.C., Trott, C.R.: Kokkos: enabling performance portability across manycore architectures. In: Proceedings of the 2013 Extreme Scaling Workshop (XSW 2013), XSW 2013, pp. 18–24 (2013). http://dx.doi.org/10.1109/XSW.2013.7

  3. Estérie, P., Gaunard, M., Falcou, J., Lapresté, J.T., Rozoy, B.: Boost.SIMD: generic programming for portable SIMDization. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT 2012, pp. 431–432. ACM, New York (2012). http://doi.acm.org/10.1145/2370816.2370881

  4. Härdtlein, J., Pflaum, C., Linke, A., Wolters, C.H.: Advanced expression templates programming. Comput. Vis. Sci. 13(2), 59–68 (2009). http://dx.doi.org/10.1007/s00791-009-0128-2

    Article  Google Scholar 

  5. Henretty, T., Veras, R., Franchetti, F., Pouchet, L.N., Ramanujam, J., Sadayappan, P.: A stencil compiler for short-vector SIMD architectures. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing - ICS 2013, p. 13 (2013). http://dl.acm.org/citation.cfm?doid=2464996.2467268

  6. Iglberger, K., Hager, G., Treibig, J., Rüde, U.: Expression templates revisited: a performance analysis of current methodologies. SIAM J. Sci. Comput. 34(2), C42–C69 (2012). http://dx.doi.org/10.1137/110830125

    Article  MathSciNet  Google Scholar 

  7. Intel Corporation: Intel Threading Building Blocks (2016)

    Google Scholar 

  8. Iverson, K.E.: Notation as a tool of thought. Commun. ACM 23(8), 444–465 (1980). http://doi.acm.org/10.1145/358896.358899

    Article  MathSciNet  Google Scholar 

  9. Joo, B., Smelyanskiy, M., Kalamkar, D.D., Vaidyanathan, K.: Wilson Dslash kernel from lattice QCD optimization, July 2015. http://www.osti.gov/scitech/servlets/purl/1223094

  10. Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: Proceedings of the 2005 Workshop on Memory System Performance, MSP 2005, pp. 36–43. ACM, New York (2005). http://doi.acm.org/10.1145/1111583.1111589

  11. Kennedy, K., Broom, B., Chauhan, A., Fowler, R.J., Garvin, J., Koelbel, C., Mccosh, C., Mellor-Crummey, J.: Telescoping languages: a system for automatic generation of domain languages. Proc. IEEE 93(2), 387–408 (2005)

    Article  Google Scholar 

  12. Majeti, D., Barik, R., Zhao, J., Grossman, M., Sarkar, V.: Compiler-driven data layout transformation for heterogeneous platforms. In: Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 188–197. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54420-0_19

    Chapter  Google Scholar 

  13. Maslov, V.: Delinearization: an efficient way to break multiloop dependence equations. In: Proceedings of the SIGPLAN 1992 Conference on Programming Language Design and Implementation, pp. 152–161 (1992)

    Google Scholar 

  14. More, T.: Axioms and theorems for a theory of arrays. IBM J. Res. Dev. 17(2), 135–175 (1973). http://dx.doi.org/10.1147/rd.172.0135

    Article  MathSciNet  MATH  Google Scholar 

  15. Mullin, L.: A mathematics of arrays. Ph.D. thesis, Syracuse University, December 1988

    Google Scholar 

  16. Roth, G., Mellor-Crummey, J., Kennedy, K., Brickner, R.G.: Compiling stencils in high performance fortran. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, SC 1997. pp. 1–20. ACM, New York (1997). http://doi.acm.org/10.1145/509593.509605

  17. Haney, S., Crotinger, J., Karmesin, S., Smith, S.: Easy expression templates using PETE, the Portable Expression Template Engine. Technical report LA-UR-99-777 (1999)

    Google Scholar 

  18. Sung, I.J., Stratton, J.A., Hwu, W.M.W.: Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 513–522. ACM, New York (2010). http://doi.acm.org/10.1145/1854273.1854336

  19. Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011). http://doi.acm.org/10.1145/1989493.1989508

  20. USQCD: QDP++ (2002). http://usqcd-software.github.io/qdpxx/

  21. Veldhuizen, T.: Expression templates. C++ Report 7, 26–31 (1995)

    Google Scholar 

  22. Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Fukuda, K., Hoeven, J., Joswig, M., Takayama, N. (eds.) ICMS 2010. LNCS, vol. 6327, pp. 299–302. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15582-6_49

    Chapter  Google Scholar 

  23. Winter, F.T., Clark, M.A., Edwards, R.G., Joo, B.: A framework for lattice QCD calculations on GPUs. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, May 2014. http://dx.doi.org/10.1109/IPDPS.2014.112

  24. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI 1991, pp. 30–44. ACM, New York (1991). http://doi.acm.org/10.1145/113445.113449

  25. Xu, S., Gregg, D.: Semi-automatic composition of data layout transformations for loop vectorization. In: Hsu, C.-H., Shi, X., Salapura, V. (eds.) NPC 2014. LNCS, vol. 8707, pp. 485–496. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44917-2_40

    Google Scholar 

  26. Yan, Y., Lin, P.H., Liao, C., de Supinski, B.R., Quinlan, D.J.: Supporting multiple accelerators in high-level programming models. In: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015, pp. 170–180, ACM, New York (2015). http://doi.acm.org/10.1145/2712386.2712405

Download references

Acknowledgement

This work was supported in part by the DOE Office of Science SciDAC program on grants DE-FG02-11ER26050/DE-SC0006925 and DE-SC0008706.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diptorup Deb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Deb, D., Fowler, R.J., Porterfield, A. (2017). QUARC: An Array Programming Approach to High Performance Computing. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52709-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52708-6

  • Online ISBN: 978-3-319-52709-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics