ABSTRACT
We investigate the use of functional programming to develop a numerical linear algebra run-time; i.e. a framework where the solvers can be adapted easily to different contexts and task parallelism can be attained (semi-) automatically. We follow a bottom up strategy, where the first step is the design and implementation of a framework layer, composed by a functional version of BLAS (Basic Linear Algebra Subprograms) routines. The framework allows the manipulation of arbitrary representations for matrices and vectors and it is also possible to write and combine multiple implementations of BLAS operations based on different algorithms and parallelism strategies. Using this framework, we implement a functional version of Cholesky factorization, which serves as a proof of concept to evaluate the flexibility and performance of our approach.
- SMP Superscalar (SMPSs) User's Manual, Version 2.4. Barcelona Supercomputing Center, Barcelona, 2011.Google Scholar
- E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series, 180, 2009. 10.1088/1742-6596/180/1/012037.Google Scholar
- E. Anderson, Z. Bai, J. Demmel, J. E. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. E. McKenney, S. Ostrouchov, and D. Sorensen. phLAPACK Users' Guide, Third Edition. SIAM, Philadelphia, 1999. Google ScholarDigital Library
- R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-Ortí, and G. Quintana-Ortí. Parallelizing dense and banded linear algebra libraries using smpss. Concurrency and Computation: Practice and Experience, 21 (18): 2438--2456, 2009. Google ScholarDigital Library
- P. Bientinesi, B. Gunter, and R. A. v. d. Geijn. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 35 (1): 3:1--3:22, July 2008. ISSN 0098--3500. 10.1145/1377603.1377606. URL http://doi.acm.org/10.1145/1377603.1377606. Google ScholarDigital Library
- J. Bilmes, K. Asanović, C. whye Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proceedings of International Conference on Supercomputing, Vienna, Austria, July 1997. Google ScholarDigital Library
- L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley. An updated set of basic linear algebra subprograms (blas). ACM Transactions on Mathematical Software, 28: 135--151, 2001. Google ScholarDigital Library
- M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore gpus. In Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP '11, pages 3--14. ACM, 2011. ISBN 978--1--4503-0486--3. Google ScholarDigital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton-Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In DAMP '07: Proceedings of the 2007 workshop on Declarative aspects of multicore programming, pages 10--18. ACM, 2007. ISBN 978-1-59593-690-5. 10.1145/1248648.1248652. Google ScholarDigital Library
- E. Chan, F. G. V. Zee, P. Bientinesi, E. S. Quintana-Ortí, G. Quintana-Ortí, and R. A. van de Geijn. Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In S. Chatterjee and M. L. Scott, editors, PPOPP, pages 123--132. ACM, 2008. ISBN 978-1-59593-795-7. Google ScholarDigital Library
- J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker. ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers. In Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, pages 120--127. IEEE Comput. Soc. Press, 1992.Google ScholarCross Ref
- A. Foltzer, A. Kulkarni, R. Swords, S. Sasidharan, E. Jiang, and R. Newton. A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud. SIGPLAN Not., 47 (9): 235--246, Sept. 2012. ISSN 0362-1340. 10.1145/2398856.2364562. URL http://doi.acm.org/10.1145/2398856.2364562. Google ScholarDigital Library
- A. V. Gerbessiotis. Algorithmic and Practical Considerations for Dense Matrix Computations on the BSP Model. PRG-TR 32, Oxford University Computing Laboratory, 1997. URL http://web.njit.edu/alexg/pubs/papers/PRG3297.ps.Google Scholar
- G. Golub and C. V. Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 3rd edition, 1996. Google ScholarDigital Library
- R. Hinze. Fun with phantom types. In J. Gibbons and O. de Moor, editors, The Fun of Programming, Cornerstones of Computing, pages 245--262. Palgrave Macmillan, 2003.Google Scholar
- G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in haskell. In Proceedings of the 15th Intl. Conf. on Funct. Progr., ICFP '10, pages 261--272. ACM, 2010. ISBN 978-1-60558-794-3. Google ScholarDigital Library
- O. Kiselyov, S. P. Jones, and C. chieh Shan. Fun with type functions. In A. W. Roscoe, C. B. Jones, and K. Wood, editors, Reflections on the work of C. A. R. Hoare. Springer, 2010.Google Scholar
- S. Marlow, S. L. Peyton-Jones, and S. Singh. Runtime support for multicore Haskell. In ICFP 2009, Intl. Conf. on Functional Programming, pages 65--78. ACM, 2009. http://doi.acm.org/10.1145/1596550.1596563. Google ScholarDigital Library
- S. Marlow, P. Maier, H.-W. Loidl, M. K. Aswad, and P. W. Trinder. Seq no more: Better strategies for parallel Haskell. In Haskell Symposium 2010, Baltimore, MD, USA, Sept. 2010. ACM Press. Google ScholarDigital Library
- S. Marlow, R. Newton, and S. Peyton-Jones. A monad for deterministic parallelism. In Haskell '11: Proceedings of the Fourth Symposium on Haskell. ACM, 2011. Google ScholarDigital Library
- S. Peyton Jones. Harnessing the multicores: Nested data parallelism in haskell. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems, APLAS '08, pages 138--138, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 978-3-540-89329-5. Google ScholarDigital Library
- S. Peyton-Jones et al. The Haskell 98 language and libraries: The revised report. Journal of Functional Programming, 13 (1): 0-255, Jan 2003.Google Scholar
- E. Quintana-Ortí;, G. Quintana-Ortí;, X. Sun, and R. van de Geijn. A note on parallel matrix inversion. SIAM Journal on Scientific Computing, 22 (5): 1762--1771, 2000. Google ScholarDigital Library
- C. Whaley, A. Petitet, and J. J. Dongarra. Automated Empirical Optimization of Software and the ATLAS Project. In PARALLEL COMPUTING, volume 27, 2000.Google Scholar
Index Terms
- Towards a functional run-time for dense NLA domain
Recommendations
Compiler blockability of dense matrix factorizations
The goal of the LAPACK project is to provide efficient and portable software for dense numerical linear algebra computations. By recasting many of the fundamental dense matrix computations in terms of calls to an efficient implementation of the BLAS (...
Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format
We present subroutines for the Cholesky factorization of a positive-definite symmetric matrix and for solving corresponding sets of linear equations. They exploit cache memory by using the block hybrid format proposed by the authors in a companion ...
Exploiting zeros on the diagonal in the direct solution of indefinite sparse symmetric linear systems
We describe the design of a new code for the solution of sparse indefinite symmetric linear systems of equations. The principal difference between this new code and earlier work lies in the exploitation of the additional sparsity available when the ...
Comments