research-article

Towards a functional run-time for dense NLA domain

Authors:
Mauro Blanco

Universidad de la República, Montevideo, Uruguay

Universidad de la República, Montevideo, Uruguay
View Profile

,
Pablo Perdomo

Universidad de la República, Montevideo, Uruguay

Universidad de la República, Montevideo, Uruguay
View Profile

,
Pablo Ezzatti

Universidad de la República, Montevideo, Uruguay

Universidad de la República, Montevideo, Uruguay
View Profile

,
Alberto Pardo

Universidad de la República, Montevideo, Uruguay

Universidad de la República, Montevideo, Uruguay
View Profile

,
Marcos Viera

Universidad de la República, Montevideo, Uruguay

Universidad de la República, Montevideo, Uruguay
View Profile

FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computingSeptember 2013Pages 85–96https://doi.org/10.1145/2502323.2502327

Published:23 September 2013Publication History

FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Pages 85–96

ABSTRACT

We investigate the use of functional programming to develop a numerical linear algebra run-time; i.e. a framework where the solvers can be adapted easily to different contexts and task parallelism can be attained (semi-) automatically. We follow a bottom up strategy, where the first step is the design and implementation of a framework layer, composed by a functional version of BLAS (Basic Linear Algebra Subprograms) routines. The framework allows the manipulation of arbitrary representations for matrices and vectors and it is also possible to write and combine multiple implementations of BLAS operations based on different algorithms and parallelism strategies. Using this framework, we implement a functional version of Cholesky factorization, which serves as a proof of concept to evaluate the flexibility and performance of our approach.

References

SMP Superscalar (SMPSs) User's Manual, Version 2.4. Barcelona Supercomputing Center, Barcelona, 2011.Google Scholar
E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series, 180, 2009. 10.1088/1742-6596/180/1/012037.Google Scholar
E. Anderson, Z. Bai, J. Demmel, J. E. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. E. McKenney, S. Ostrouchov, and D. Sorensen. phLAPACK Users' Guide, Third Edition. SIAM, Philadelphia, 1999. Google ScholarDigital Library
R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-Ortí, and G. Quintana-Ortí. Parallelizing dense and banded linear algebra libraries using smpss. Concurrency and Computation: Practice and Experience, 21 (18): 2438--2456, 2009. Google ScholarDigital Library
P. Bientinesi, B. Gunter, and R. A. v. d. Geijn. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 35 (1): 3:1--3:22, July 2008. ISSN 0098--3500. 10.1145/1377603.1377606. URL http://doi.acm.org/10.1145/1377603.1377606. Google ScholarDigital Library
J. Bilmes, K. Asanović, C. whye Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proceedings of International Conference on Supercomputing, Vienna, Austria, July 1997. Google ScholarDigital Library
L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley. An updated set of basic linear algebra subprograms (blas). ACM Transactions on Mathematical Software, 28: 135--151, 2001. Google ScholarDigital Library
M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore gpus. In Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP '11, pages 3--14. ACM, 2011. ISBN 978--1--4503-0486--3. Google ScholarDigital Library
M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton-Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In DAMP '07: Proceedings of the 2007 workshop on Declarative aspects of multicore programming, pages 10--18. ACM, 2007. ISBN 978-1-59593-690-5. 10.1145/1248648.1248652. Google ScholarDigital Library
E. Chan, F. G. V. Zee, P. Bientinesi, E. S. Quintana-Ortí, G. Quintana-Ortí, and R. A. van de Geijn. Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In S. Chatterjee and M. L. Scott, editors, PPOPP, pages 123--132. ACM, 2008. ISBN 978-1-59593-795-7. Google ScholarDigital Library
J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker. ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers. In Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, pages 120--127. IEEE Comput. Soc. Press, 1992.Google ScholarCross Ref
A. Foltzer, A. Kulkarni, R. Swords, S. Sasidharan, E. Jiang, and R. Newton. A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud. SIGPLAN Not., 47 (9): 235--246, Sept. 2012. ISSN 0362-1340. 10.1145/2398856.2364562. URL http://doi.acm.org/10.1145/2398856.2364562. Google ScholarDigital Library
A. V. Gerbessiotis. Algorithmic and Practical Considerations for Dense Matrix Computations on the BSP Model. PRG-TR 32, Oxford University Computing Laboratory, 1997. URL http://web.njit.edu/alexg/pubs/papers/PRG3297.ps.Google Scholar
G. Golub and C. V. Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 3rd edition, 1996. Google ScholarDigital Library
R. Hinze. Fun with phantom types. In J. Gibbons and O. de Moor, editors, The Fun of Programming, Cornerstones of Computing, pages 245--262. Palgrave Macmillan, 2003.Google Scholar
G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in haskell. In Proceedings of the 15th Intl. Conf. on Funct. Progr., ICFP '10, pages 261--272. ACM, 2010. ISBN 978-1-60558-794-3. Google ScholarDigital Library
O. Kiselyov, S. P. Jones, and C. chieh Shan. Fun with type functions. In A. W. Roscoe, C. B. Jones, and K. Wood, editors, Reflections on the work of C. A. R. Hoare. Springer, 2010.Google Scholar
S. Marlow, S. L. Peyton-Jones, and S. Singh. Runtime support for multicore Haskell. In ICFP 2009, Intl. Conf. on Functional Programming, pages 65--78. ACM, 2009. http://doi.acm.org/10.1145/1596550.1596563. Google ScholarDigital Library
S. Marlow, P. Maier, H.-W. Loidl, M. K. Aswad, and P. W. Trinder. Seq no more: Better strategies for parallel Haskell. In Haskell Symposium 2010, Baltimore, MD, USA, Sept. 2010. ACM Press. Google ScholarDigital Library
S. Marlow, R. Newton, and S. Peyton-Jones. A monad for deterministic parallelism. In Haskell '11: Proceedings of the Fourth Symposium on Haskell. ACM, 2011. Google ScholarDigital Library
S. Peyton Jones. Harnessing the multicores: Nested data parallelism in haskell. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems, APLAS '08, pages 138--138, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 978-3-540-89329-5. Google ScholarDigital Library
S. Peyton-Jones et al. The Haskell 98 language and libraries: The revised report. Journal of Functional Programming, 13 (1): 0-255, Jan 2003.Google Scholar
E. Quintana-Ortí;, G. Quintana-Ortí;, X. Sun, and R. van de Geijn. A note on parallel matrix inversion. SIAM Journal on Scientific Computing, 22 (5): 1762--1771, 2000. Google ScholarDigital Library
C. Whaley, A. Petitet, and J. J. Dongarra. Automated Empirical Optimization of Software and the ATLAS Project. In PARALLEL COMPUTING, volume 27, 2000.Google Scholar

Index Terms

Towards a functional run-time for dense NLA domain
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Functional languages
        Parallel programming languages

Recommendations

Compiler blockability of dense matrix factorizations

The goal of the LAPACK project is to provide efficient and portable software for dense numerical linear algebra computations. By recasting many of the fundamental dense matrix computations in terms of calls to an efficient implementation of the BLAS (...
Read More
Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format

We present subroutines for the Cholesky factorization of a positive-definite symmetric matrix and for solving corresponding sets of linear equations. They exploit cache memory by using the block hybrid format proposed by the authors in a companion ...
Read More
Exploiting zeros on the diagonal in the direct solution of indefinite sparse symmetric linear systems

We describe the design of a new code for the solution of sparse indefinite symmetric linear systems of equations. The principal difference between this new code and earlier work lies in the exploitation of the additional sparsity available when the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
September 2013
104 pages
ISBN:9781450323819
DOI:10.1145/2502323
General Chairs:
Clemens Grelck
University of Amsterdam, The Netherlands
,
Fritz Henglein
University of Copenhagen, Denmark
,
Program Chairs:
Umut Acar
Carnegie Mellon University, USA
,
Jost Berthold
University of Copenhagen, Denmark
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 September 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
blas
haskell
nla
parallelism
Qualifiers
- research-article
Conference

Acceptance Rates
FHPC '13 Paper Acceptance Rate8of14submissions,57%Overall Acceptance Rate18of25submissions,72%
More
Upcoming Conference
ICFP '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Functional Programming

September 9 - 13, 2024

Milan , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 40
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards a functional run-time for dense NLA domain

FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Compiler blockability of dense matrix factorizations

Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format

Exploiting zeros on the diagonal in the direct solution of indefinite sparse symmetric linear systems