Skip to main content

A New Direction to Parallelize Winograd’s Algorithm on Distributed Memory Computers

  • Conference paper
Book cover Modeling, Simulation and Optimization of Complex Processes
  • 1622 Accesses

Abstract

Winograd’s algorithm to multiply two n × n matrices reduces the asymptotic operation count from O(n 3) of the traditional algorithm to O(n 2.81), hence on distributed memory computers, the combination of Winograd’s algorithm and the parallel matrix multiplication algorithms always gives remarkable results. Within this combination, the application of Winograd’s algorithm at the inter-processor level requires us to solve more difficult problems but it leads to more effective algorithms. In this paper, a general formulation of these algorithms will be presented. We also introduce a scalable method to implement these algorithms on distributed memory computers. This work also opens a new direction to parallelize Winograd’s algorithm based on the generalization of Winograd’s formula for the case where the matrices are partitioned into 2k parts (the case k = 2 gives us the original formula).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. E. Cannon. A cellular computer to implement the kalman filter algorithm. Ph.d. thesis, Montana State University, 1969.

    Google Scholar 

  2. J. Choi. A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers. In 11th International Parallel Processing Symposium, pages 310–317, Geneva, Switzerland, April 1997. IEEE CS.

    Chapter  Google Scholar 

  3. J. Choi, J. J. Dongarra, and D. W. Walker. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers. Concurrency: Practice and Experience, 6(7):543–570, 1994.

    Article  Google Scholar 

  4. C.-C. Chou, Y. Deng, G. Li, and Y. Wang. Parallelizing strassen’s method for matrix multiplication on distributed memory mimd architectures. Computers and Math. with Applications, 30(2):4–9, 1995.

    MathSciNet  Google Scholar 

  5. D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251–280, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  6. G. Fox, S. Otto, and A. Hey. Matrix algorithms on a hypercube i: Matrix multiplication. Parallel Computing, 4:17–31, 1987.

    Article  MATH  Google Scholar 

  7. G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins University Press, 2nd edition, 1989.

    Google Scholar 

  8. B. Grayson, A. Shah, and R. van de Geijn. A high performance parallel Strassen implementation. Parallel Processing Letters, 6(1):3–12, 1996.

    Article  Google Scholar 

  9. S. Huss-Lederman, E. M. Jacobson, A. Tsao, and G. Zhang. Matrix multiplication on the intel touchstone delta. Concurrency: Practice and Experience, 6(7):571–594, 1994.

    Article  Google Scholar 

  10. B. Kumar, C.-H. Huang, R. W. Johnson, and P. Sadayappan. A tensor product formulation of Strassen’s matrix multiplication algorithm. Applied Mathematics Letters, 3(3):67–71, 1990.

    Article  MathSciNet  Google Scholar 

  11. J. Laderman, V. Y. Pan, and H. X. Sha. On practical algorithms for accelerated matrix multiplication. Linear Adgebra and Its Applications, 162:557–588, 1992.

    Article  MathSciNet  Google Scholar 

  12. Q. Luo and J. B. Drake. A scalable parallel Strassen’s matrix multiplication algorithm for distributed memory computers. In Proceedings of the 1995 ACM symposium on Applied computing, pages 221 – 226, Nashville, Tennessee, United States, 1995. ACM Press.

    Chapter  Google Scholar 

  13. V. Y. Pan. How can we speed up matrix multiplication? SIAM Review, 26(3):393–416, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  14. V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354–356, 1969.

    Article  MATH  MathSciNet  Google Scholar 

  15. R. van de Geijn and J. Watts. Summa: Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience, 9(4):255–274, April 1997.

    Article  Google Scholar 

  16. S. Winograd. On multiplication of 2 x 2 matrices. Linear Algebra and its Applications, 4:381–388, 1971.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, D.K., Lavallee, I., Bui, M. (2008). A New Direction to Parallelize Winograd’s Algorithm on Distributed Memory Computers. In: Bock, H.G., Kostina, E., Phu, H.X., Rannacher, R. (eds) Modeling, Simulation and Optimization of Complex Processes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79409-7_31

Download citation

Publish with us

Policies and ethics