A Paradigm for Parallel Matrix Algorithms:

Wise, David S.; Citro, Craig; Hursey, Joshua; Liu, Fang; Rainey, Michael

doi:10.1007/11549468_76

David S. Wise¹⁸,
Craig Citro¹⁸,
Joshua Hursey¹⁸,
Fang Liu¹⁸ &
…
Michael Rainey¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3648))

Included in the following conference series:

European Conference on Parallel Processing

820 Accesses
4 Citations

Abstract

A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The underlying philosophy is to use block recursion as the exclusive control structure, down to a 2^p× 2^p base case anyway, where hardware favors iterative style to fill its pipe. Use of Morton-ordered matrices yields excellent locality within the memory hierarchy—including block sharing among distributed computers. The recursion generalizes nicely to an SPMD program where such sharing is the only communication.

Cholesky factorization of an n × n SPD matrix is used as a simple nontrivial example to expose the paradigm. The program amounts to four functions, two of which are finalizers for the other two. This insight allows final blocks to be shared with inter-node communication ∈ Θ(n ²) for this algorithm ∈ Θ (n ³) flops.

Supported, in part, by the National Science Foundation under grants numbered CCR-0073491, ACI–0219884, and EIA–0202048. Copyright on twelve pages intact transferred, with rights reserved for anyone to make digital or hard copies of part or all of this work for personal or classroom use, provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full Springer citation on the first page. Rights are similarly reserved for any library to share a hard copy through interlibrary loan.

Download to read the full chapter text

Chapter PDF

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers

A Simple Study of Pleasing Parallelism on Multicore Computers

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Article 01 October 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chatterjee, S., Lebeck, A.R., Patnala, P.K., Thottenthodi, M.: Recursive array layouts and fast parallel matrix multiplication. IEEE Trans. Parallel Distrib. Syst. 13, 1105–1123 (2002), http://dx.doi.org/10.1109/TPDs.2002.105s095
Article Google Scholar
Thiyagalingam, J., Beckmann, O., Kelly, P.H.J.: Is Morton layout competitive for large two-dimensional arrays, yet? Concur. Comput. Prac. Exper. (2004) ,To appear in special issue on Compilers for Parallel Computing, http://www.docic.ac.uk/~phjk/Publications/IsMortonYetCCPandE2004.pdf
Goto, K., van de Geijn, R.: On reducing TLB misses in matrix multiplication.FLAME Working Note 9, Univ. of Texas, Austin (2002), http://www.cs.utexas.edu/users/flame/pubs/GOTO.ps.gz
Morton, C.: A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario (1966)
Google Scholar
Drakenberg, P., Lundevall, F., Lisper, B.: An efficient semi-hierarchical array layout. In: Lee, C., Yew, P.C. (eds.) Interaction between Compilers and Computer Architectures. Kluwer Intl. Series in Engineering and Computer Science, vol. 613, Kluwer, Deventer (2001), http://www.mrtc.mdh.se/publications/0313.pdf
Google Scholar
Wise, D.S.: Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In: Bode, A., Ludwig, T., Karl, W.C., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, pp. 774–883. Springer, Heidelberg (2000)
Chapter Google Scholar
Wise, D.S., Frens, J.D., Gu, Y., Alexander, G.A.: Language support for Morton-order matrices. In: Proc. 8th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program. SIGPLAN Not., vol. 36, pp. 24–33 (2001), http://doi.acm.org/10.1145/379539.379559
Schrack, G.: Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst. 55, 221–230 (1992)
Article MATH Google Scholar
Raman, R., Wise, D.S.: Converting to and from dilated integers. Submitted for publication (2004), http://www.cs.indiana.edu/dswise/Arcee/castingDilated-comb.pdf
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Ann. Symp. Foundations of Computer Science, pp. 285–298. IEEE Computer Soc. Press, Washington (1999), http://dx.doi.org/10.1109/SFFCS.1999.814600
Google Scholar
Frens, J.D.: Matrix Factorization Using a Block-Recursive Structure and Block-Recursive Algorithms. PhD thesis, Indiana Univ., Bloomington (2002), http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR568
Spiefi, J.: Untersuchungen des Zeitgewinns durch neue Algorithmen zur Matrix-Multiplication. Computing 17, 23–36 (1976)
Article MathSciNet Google Scholar
Tocher, K.D.: The application of automatic computers to sampling experiments. J. Roy. Statist. Soc. Ser. B 16, 39–61,53-55 (1954)
MATH MathSciNet Google Scholar
Johnson, D.S.: A theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser, M.H., Johnson, D.S., McGeoch, C.C. (eds.) Data Structures, Near Neighbor Searches, and Methodology: 5th & 6th DIMACS Implementation Challenges. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. Amer. Math. Soc, Providence, vol. 59, pp. 215–250 (2002), http://www.research.att.com/~dsj/papers.html
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proc. Supercomputing 1998, vol. 38, IEEE Computer Soc. Press, Washington (1998), http://dx.doi.org/10.1109/SC.1998.10004
Google Scholar
Intel Corp. Santa Clara, CA: Intel Math Kernel Library (2003), http://www.intel.com/software/products/mkl/
LAM/MPI Bloomington, IN (2004) , www.lam-mpi.org
InfiniBand Trade Assn. Portland, OR (2004), www.infinibandta.org
InfiniCon Systems King of Prussia, PA (2004) , www.infinicon.com
Myricom Inc. Arcadia, CA (2004) , www.myri.com
Quadrics Ltd. Bristol, UK (2004), www.quadrics.com
Quadrics Ltd. Bristol, UK: Quadrics Release of MPICH 1.24. (2004), www.quadrics.com

Download references

Author information

Authors and Affiliations

Indiana University, Bloomington
David S. Wise, Craig Citro, Joshua Hursey, Fang Liu & Michael Rainey

Authors

David S. Wise
View author publications
You can also search for this author in PubMed Google Scholar
Craig Citro
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Hursey
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Michael Rainey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Topic Chairs,
José C. Cunha
Faculdade de Ciências e Technologia CITI Centre, Quinta da Torre, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
Pedro D. Medeiros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wise, D.S., Citro, C., Hursey, J., Liu, F., Rainey, M. (2005). A Paradigm for Parallel Matrix Algorithms:. In: Cunha, J.C., Medeiros, P.D. (eds) Euro-Par 2005 Parallel Processing. Euro-Par 2005. Lecture Notes in Computer Science, vol 3648. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11549468_76

Download citation

DOI: https://doi.org/10.1007/11549468_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28700-1
Online ISBN: 978-3-540-31925-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Paradigm for Parallel Matrix Algorithms:

Abstract

Chapter PDF

Similar content being viewed by others

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers

A Simple Study of Pleasing Parallelism on Multicore Computers

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Paradigm for Parallel Matrix Algorithms:

Abstract

Chapter PDF

Similar content being viewed by others

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers

A Simple Study of Pleasing Parallelism on Multicore Computers

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation