Skip to main content
Log in

Parallel Complexity of Matrix Multiplication1

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm; namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)–(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, and P.Palkar. A 3D approach to parallel matrix multiplication. IBM J. Res. Develop., 1995.

  2. R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high-performance matrix multiplication algorithm on a distributed-memory parallel computer using overlapped communication. IBM J. Res. Develop., 1994.

  3. A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. In Theoretical Computer Science, 3-28, March 1990.

  4. J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12:335-342, 1989.

    Google Scholar 

  5. L. E. Cannon. A Cellular Computer to Implement the Kalman Filter Algorithm. Ph.D. thesis, Montana State University, 1969.

  6. S. Chatterjee, A. Lebeck, P. Patnala, and M. Thottethodi. Recursive array layouts and fast parallel matrix multiplication. In Proceedings of the Symposium on Parallel Algorithms and Architecture, 1999.

  7. V. Cherkassky and R. Smith. Efficient mapping and implementation of matrix algorithms on a hypercube. J. Supercomputing, 2:7-27, 1988.

    Google Scholar 

  8. J. Choi, J. Dongarra, and D. W. Walker. PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed-Memory Concurrent Computers. Concurrency: Pract. & Exper., Vol. 6, October 1994.

  9. D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, E. Santos, K. E. Schauser, R. Subramonian, and T. von Eicken. LogP: A practical model of parallel computation. Communications of the ACM, 37(11):78-85, 1996.

    Google Scholar 

  10. E. Dekel, D. Nassimi, and S. Sahni. Parallel matrix and graph algorithms. SIAM Journal of Computer, 10:657-673, 1981.

    Google Scholar 

  11. J. W. Demmel, M. T. Heath, and H. A. van der Vorst. Parallel numerical linear algebra. Technical Report UCB/CSD 93/703, University of California at Berkeley, 1993.

  12. G. Fox. Domain decomposition in distributed and shared memory environments. In Proceedings of the Int. Conf. on Supercomputing, 1042-1073, 1987.

  13. G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, Vol. i, Prentice-Hall, 1988.

  14. G. Fox, S. Otto, and A. Hey. Matrix algorithms on a hypercube i: Matrix multiplication. Parallel Computing, 4:17-31, 1987.

    Google Scholar 

  15. A. Gupta and V. Kumar. Scalability of parallel algorithms for matrix multiplications. Technical Report 91-54, University of Minnesota, 1991.

  16. C.-T. Ho, S. L. Johnsson, and A. Edelman. Matrix multiplication on hypercubes using full band bandwidth and constant storage. In Proceedings of the Sixth Distributed-Memory Computing Conference, 1991.

  17. S. Huss-Lederman, E. M. Jacobson, and A. Tsao. Comparison of scalable parallel matrix multiplication libraries. In Proceedings of the Parallel Scalable Library Conference, 1993.

  18. S. Huss-Lederman, E. M. Jacobson, A. Tsao, and G. Zhang. Matrix multiplication on the intel touchstone delta. In Proceedings of the Sixth SIAM Conference on Parallel Processing and Scientific Computing, 1993.

  19. R. M. Karp, A. Sahay, E. E. Santos, and K. E. Schauser. Optimal broadcast and summation on the LogP model. In Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architectures, 1993.

  20. G. Li, A. Skjellum, and R. D. Falgout. A poly-algorithm for parallel dense matrix multiplication on 2D process grid topologies. Concurrency: Pract. and Expr., 9(5):345-389, 1997.

    Google Scholar 

  21. K. Li. Scalable parallel matrix multiplication on distributed memory parallel computers. In Proceedings of the International Parallel and Distributed Processing Symposium, 2000.

  22. C. Lin and L. Snyder. A matrix product algorithm and its comparative performance on hypercubes. In Proceedings of the Scalable High Performance Computing Conference, 1992.

  23. S. Sahni. Matrix multiplication and data routing using a partitioned optical passive stars network. IEEE Trans. on Parallel and Distributed Systems, 11(7), 2000.

  24. E. E. Santos. Optimal and efficient parallel algorithms for summing and prefix summing. J. Parallel and Distributed Computing, 62(4), 517-543, 2002.

    Google Scholar 

  25. E. E. Santos. Optimal parallel algorithms for matrix multiplication. In Proceedings of the Eighth SIAM Conference on Parallel Processing and Scientific Computing, 1997.

  26. E. E. Santos. Optimal parallel algorithms for solving tridiagonal linear systems. In Springer-Verlag Lecture Notes in Computer Science #1300, 1997.

  27. R. van de Geijn and J. Watts. SUMMA: Scalable universal matrix multiplication algorithm. In Concurrency: Pract. & Exper., Vol. 9, April 1997.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santos, E.E. Parallel Complexity of Matrix Multiplication1 . The Journal of Supercomputing 25, 155–175 (2003). https://doi.org/10.1023/A:1023996628662

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023996628662

Navigation