Skip to main content
Log in

A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A new parallel implementation of Strassen’s matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D, all-port torus interconnection networks. The proposed algorithm employs a special conflict-free routing pattern for better scalability and is able to yield a performance rate very close to the theoretical bound for many practical network and matrix sizes. It effectively scales up to very large networks typically containing hundreds-of-thousands processors where petaflop or exaflop processing rates are sought.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al Bdaiwi BF, Bose B (2003) On resource placements in 3D tori. J Parallel Distrib Comput 63(9):838–845

    Article  Google Scholar 

  2. Baransel C, İmre K, Artuner H (2010) New parallel matrix multiplication algorithms for wormhole-routed al-port 2D/3D torus networks. In: Mathematical methods in engineering international symposium (MME ’10), Coimbra, Portugal, 21–24 October, 2010, ISBN 978-989-8331-11-3

    Google Scholar 

  3. Bini DA (2007) Chapter 47: Fast matrix multiplication. In: Hogben L (ed) Handbook of linear algebra. Chapman & Hall/CRC Press, London/New York

    Google Scholar 

  4. Bodrato M (2010) A Strassen-like matrix multiplication suited for squaring and higher power computation. In: Proceedings of the 2010 international symposium on symbolic and algebraic computation (ISSAC’10). ACM, New York, pp 273–280

    Chapter  Google Scholar 

  5. Boyer B, Dumas J, Pernet C, Zhou W (2009) Memory efficient scheduling of Strassen–Winograd’s matrix multiplication algorithm. In: Proceedings of the 2009 international symposium on symbolic and algebraic computation (ISSAC’09). ACM, New York, pp 55–62

    Chapter  Google Scholar 

  6. Chou C, Deng Y, Li G, Wang Y (1995) Parallelizing Strassen’s method for matrix multiplication on distributed-memory MIMD architectures. Comput Math Appl 30(2):49–69

    Article  MathSciNet  MATH  Google Scholar 

  7. Desprez F, Suter F (2002) Impact of mixed-parallelism on parallel implementations of Strassen and Winograd matrix multiplication algorithms, INRIA Research Report No 4482

  8. Dimitrescu B, Roch JL, Trystram D (1994) Fast matrix multiplication algorithms on MIMD architectures. Parallel Algorithms Appl 4(2):53–70

    Article  Google Scholar 

  9. Gramma A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Reading

    Google Scholar 

  10. Gastinel N (1971) Sur le calcul des produits de matrices. Numer Math 17:222–229

    Article  MathSciNet  MATH  Google Scholar 

  11. Grayson B, Shah A, van de Geijn R (1995) A high performance parallel Strassen implementation. Dept. of Computer Sciences, The University of Texas, TR-95-24, June 1995

  12. İmre KM, Baransel C, Artuner H (2011) Efficient and scalable routing algorithms for collective communication operations on 2D all-port torus networks. Int J Parallel Program 39(6):746–782

    Article  Google Scholar 

  13. Hopcroft J, Kerr L (1971) On minimizing the number of multiplications necessary for matrix multiplication. SIAM J Appl Math 20(1):30–36

    Article  MathSciNet  MATH  Google Scholar 

  14. Hunold S, Rauber T, Rünger G (2008) Combining building blocks for parallel multi-level matrix multiplication. Parallel Comput 34:411–426

    Article  MathSciNet  Google Scholar 

  15. Kolen JF, Bruce P (2001) Evolutionary search for matrix multiplication algorithms. In: FLAIRS conference, pp. 161–165

    Google Scholar 

  16. Laderman JD (1976) A noncommutative algorithm for multiplying 3×3 matrices using 23 multiplications. Bull Am Math Soc 82(1):126–128

    Article  MathSciNet  MATH  Google Scholar 

  17. Luo Q, Drake JB (1995) A scalable parallel Strassen’s matrix multiplication algorithm for distributed-memory computers. In: Proceedings of the 1995 ACM symposium on applied computing. ACM Press, New York, pp 221–226

    Chapter  Google Scholar 

  18. Ni LM, McKinley PK (1993) A survey of wormhole routing techniques in direct networks. Computer 26(2):62–76

    Article  Google Scholar 

  19. Oh S, Moon B (2010) Automatic reproduction of a genius algorithm: Strassen’s algorithm revisited by genetic search. IEEE Trans Evol Comput 14(2):246–251

    Article  Google Scholar 

  20. Ohtaki Y, Takahashi D, Boku T, Sato M (2004) Parallel implementation of Strassen’s matrix multiplication algorithm for heterogeneous clusters. In: Proceedings of the 18th IEEE international parallel and distributed processing symposium (IPDPS ’04)

    Google Scholar 

  21. Song F, Dongarra J, Moore S (2006) Experiments with Strassen’s algorithm: from sequential to parallel. In: Proceedings of the 18th IASTED international conference on parallel and distributed computing and systems (PDCS ’06)

    Google Scholar 

  22. Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13:354–356

    Article  MathSciNet  MATH  Google Scholar 

  23. Winograd S (1971) On multiplication of 2×2 matrices. Linear Algebra Appl 4:381–388

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kayhan M. İmre.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baransel, C., İmre, K.M. A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks. J Supercomput 62, 486–509 (2012). https://doi.org/10.1007/s11227-011-0730-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0730-1

Keywords

Navigation