Skip to main content

Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral

  • Conference paper
  • First Online:
High Performance Computing for Computational Science -- VECPAR 2014 (VECPAR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Abstract

Matrix-Matrix Multiplication (MMM) is a fundamental operation in scientific computing. Achieving the floating point peak with this operation requires expert knowledge of linear algebra and computer architecture to craft a tuned implementation, for a given microarchitecture. To do this an expert follows a mechanical process for implementing MMM, by deriving an algorithm from models found in the literature. Then, the expert applies optimizations which are well suited for the target architecture. Lastly, the expert expresses that implementation in assembly code. In this paper, we argue that this process is mechanical and can be captured in a rule based program generation system such as Spiral. We then show that given this machinery, Spiral can produce code for large size MMM implementations that are competitive with hand tuned code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goto, K., van de Geijn, R.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 12:1–12:25 (2008)

    Article  Google Scholar 

  2. Van Zee, F., van de Geijn, R.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. (2013)

    Google Scholar 

  3. Spampinato, D., Püschel, M.: A Basic Linear Algebra Compiler. ACM CG 23 (2014)

    Google Scholar 

  4. Qian, W., Xianyi, Z., Yunquan, Z., Yi, Q.: AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In: International Conference on High Performance Computing (2013)

    Google Scholar 

  5. Franchetti, F., de Mesmay, F., McFarlin, D., Püschel, M.: Operator language: a program generation framework for fast kernels. In: Taha, W.M. (ed.) DSL 2009. LNCS, vol. 5658, pp. 385–409. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Franchetti, F., Püschel, M.: Formal loop merging for signal transforms. In: PLDI, pp. 315–326 (2005)

    Google Scholar 

  7. Püschel, M., Moura, J., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R., Rizzolo, N.: SPIRAL: code generation for DSP transforms. In: Proceedings of IEEE on “Program Generation, Optimization and Adaptation”, vol.93, pp. 232–275 (2005)

    Google Scholar 

  8. Siek, J., Karlin, I., Jessup, E.: Build to order linear algebra kernels. In: Workshop on Performance Optimization of High-level Languages and Libraries (POHLL08) (2009)

    Google Scholar 

  9. Marker, B.: Design by transformation: from domain knowledge to optimized program generation. Doctoral Dissertation,Department of Computer Science, The University of Texas at Austin (2014)

    Google Scholar 

  10. Marker, B., Smith, T., Batory, D., Van Zee, F., Van de Geijn, R.: Code generation to aid parallel code development. Technical report TR-14-08, The University of Texas at Austin, Department of Computer Science (2014)

    Google Scholar 

  11. Lam, M.: Software pipelining: an effective scheduling technique for VLIW machines. In: PLDI, pp. 318–328 (2008)

    Google Scholar 

  12. Whaley. C.R., Dongarra, J.: Automatically tuned linear algebra software. In: SIAM Conference on Parallel Processing for Scientific Computing (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Veras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Veras, R., Franchetti, F. (2015). Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17353-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17352-8

  • Online ISBN: 978-3-319-17353-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics