Skip to main content

Generalizing Matrix Multiplication for Efficient Computations on Modern Computers

  • Conference paper
Book cover Parallel Processing and Applied Mathematics (PPAM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7203))

Abstract

Recent advances in computing allow taking new look at matrix multiplication, where the key ideas are: decreasing interest in recursion, development of processors with thousands (potentially millions) of processing units, and influences from the Algebraic Path Problems. In this context, we propose a generalized matrix-matrix multiply-add (MMA) operation and illustrate its usability. Furthermore, we elaborate the interrelation between this generalization and the BLAS standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Robinson, S.: Towards an optimal algorithm for matrix multiplication. SIAM News 38 (2005)

    Google Scholar 

  2. Li, J., Skjellum, A., Falgout, R.D.: A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies. Concurrency - Practice and Experience 9, 345–389 (1997)

    Article  Google Scholar 

  3. Hunold, S., Rauber, T., Rünger, G.: Combining building blocks for parallel multi-level matrix multiplication. Parallel Comput. 34, 411–426 (2008)

    Article  MathSciNet  Google Scholar 

  4. Grayson, B., Van De Geijn, R.: A high performance parallel Strassen implementation. Parallel Processing Letters 6, 3–12 (1996)

    Article  Google Scholar 

  5. Song, F., Moore, S., Dongarra, J.: Experiments with Strassen’s Algorithm: from Sequential to Parallel. In: International Conference on Parallel and Distributed Computing and Systems (PDCS 2006). ACTA Press (November 2006)

    Google Scholar 

  6. Bailey, D.H., Lee, K., Simon, H.D.: Using Strassen’s algorithm to accelerate the solution of linear systems. J. Supercomputer 4, 357–371 (1991)

    Article  MATH  Google Scholar 

  7. Paprzycki, M., Cyphers, C.: Multiplying matrices on the Cray – practical considerations. CHPC Newsletter 6, 77–82 (1991)

    Google Scholar 

  8. Sedukhin, S.G., Miyazaki, T., Kuroda, K.: Orbital systolic algorithms and array processors for solution of the algebraic path problem. IEICE Trans. on Information and Systems E93.D, 534–541 (2010)

    Article  Google Scholar 

  9. Lehmann, D.J.: Algebraic structures for transitive closure. Theoretical Computer Science 4, 59–76 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  10. Abdali, S.K., Saunders, B.D.: Transitive closure and related semiring properties via eliminants. Theoretical Computer Science 40, 257–274 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  11. Matsumoto, K., Sedukhin, S.G.: A solution of the all-pairs shortest paths problem on the Cell broadband engine processor. IEICE Trans. on Information and Systems 92-D, 1225–1231 (2009)

    Article  Google Scholar 

  12. Sedukhin, S.G., Miyazaki, T.: Rapid*Closure: Algebraic extensions of a scalar multiply-add operation. In: Philips, T. (ed.) CATA, ISCA, pp. 19–24 (2010)

    Google Scholar 

  13. Dongarra, J.J., Croz, J.D., Duff, I., Hammarling, S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Software 16, 1–17 (1990)

    Article  MATH  Google Scholar 

  14. Russell, R.M.: The CRAY-1 computer system. Commun. ACM 21, 63–72 (1978)

    Article  Google Scholar 

  15. Lawson, C.L., Hanson, R.J., Kincaid, R.J., Krogh, F.T.: Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Software 5, 308–323 (1979)

    Article  MATH  Google Scholar 

  16. Montoye, R.K., Hokenek, E., Runyon, S.L.: Design of the IBM RISC System/6000 floating-point execution unit. IBM J. Res. Dev. 34, 59–70 (1990)

    Article  Google Scholar 

  17. Gustavson, F.G., Moreira, J.E., Enenkel, R.F.: The fused multiply-add instruction leads to algorithms for extended-precision floating point: applications to Java and high-performance computing. In: CASCON 1999: Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collab. Research, p. 4. IBM Press (1999)

    Google Scholar 

  18. Gustafson, J.L.: Algorithm leadership. HPCwire, April 06 (2007)

    Google Scholar 

  19. Birkhoff, G., McLane, S.: A Survey of Modern Algebra. AKP Classics. A K Peters, Massachusetts (1997)

    Google Scholar 

  20. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI: The Complete Reference. The MIT Press, Cambridge (1996)

    Google Scholar 

  21. Gara, A., Blumrich, M.A., Chen, D., Chiu, G.L.T., Coteus, P., Giampapa, M., Haring, R.A., Heidelberger, P., Hoenicke, D., Kopcsay, G.V., Liebsch, T.A., Ohmacht, M., Steinmacher-Burow, B.D., Takken, T., Vranas, P.: Overview of the Blue Gene/L system architecture. IBM J. Res. and Dev. 49, 195–212 (2005)

    Article  Google Scholar 

  22. Sedukhin, S.G., Zekri, A.S., Myiazaki, T.: Orbital algorithms and unified array processor for computing 2D separable transforms. In: International Conference on Parallel Processing Workshops, pp. 127–134 (2010)

    Google Scholar 

  23. Dongarra, J.J., Croz, J.D., Hammarling, S., Hanson, R.J.: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Software 14, 1–17 (1988)

    Article  MATH  Google Scholar 

  24. Ganzha, M., Sedukhin, S., Paprzycki, M.: Object oriented model of generalized matrix multipication. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 439–442. IEEE Press, Los Alamitos (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sedukhin, S.G., Paprzycki, M. (2012). Generalizing Matrix Multiplication for Efficient Computations on Modern Computers. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31464-3_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31463-6

  • Online ISBN: 978-3-642-31464-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics