Program Optimization in the Domain of High-Performance Parallelism

Lengauer, Christian

doi:10.1007/978-3-540-25935-0_5

Program Optimization in the Domain of High-Performance Parallelism

Christian Lengauer¹⁹

Chapter

760 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3016))

Abstract

I consider the problem of the domain-specific optimization of programs. I review different approaches, discuss their potential, and sketch instances of them from the practice of high-performance parallelism. Readers need not be familiar with high-performance computing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Réveillère, L., Mérillon, F., Consel, C., Marlet, R., Muller, G.: A DSL approach to improve productivity and safety in device drivers development. In: Proc. Fifteenth IEEE Int. Conf. on Automated Software Engineering (ASE 2000), pp. 91–100. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
van Deursen, A., Klint, P., Visser, J.: Domain-specific languages: An annotated bibliography. ACM SIGPLAN Notices 35, 26–36 (2000)
Article Google Scholar
Hammond, K., Michaelson, G.: The design of hume: A high-level language for the real-time embedded systems domain. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 127–142. Springer, Heidelberg (2004)
Chapter Google Scholar
Quinn, M.J.: Parallel Computing. McGraw-Hill, New York (1994)
Google Scholar
Robison, A.D.: Impact of economics on compiler optimization. In: Proc. ACM 2001 Java Grande/ISCOPE Conf., pp. 1–10. ACM Press, New York (2001)
Chapter Google Scholar
Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)
MATH Google Scholar
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM Parallel Virtual Machine, A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994), Project Web page: http://www.csm.ornl.gov/pvm/pvm_home.html
Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and answers about BSP. Scientific Programming 6, 249–274 (1997), Project Web page: http://www.bsp-worldwide.org/
Google Scholar
Gorlatch, S.: Message passing without send-receive. Future Generation Computer Systems 18, 797–805 (2002)
Article MATH Google Scholar
Gorlatch, S.: Toward formally-based design of message passing programs. IEEE Transactions on Software Engineering 26, 276–288 (2000)
Article Google Scholar
Gorlatch, S.: Optimizing compositions of components in parallel and distributed programming. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 274–290. Springer, Heidelberg (2004)
Chapter Google Scholar
Kuchen, H.: Optimizing sequences of skeleton calls. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 254–273. Springer, Heidelberg (2004)
Chapter Google Scholar
Bischof, H., Gorlatch, S., Leshchinskiy, R.: Generic parallel programming using C++ templates and skeletons. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 107–126. Springer, Heidelberg (2004)
Chapter Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A linear algebra library for message-passing computers. In: Proc. Eighth SIAM Conf. on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, vol. 15 (1997), (electronic) Project Web page: http://www.netlib.org/scalapack/
van de Geijn, R.: Using PLAPACK: Parallel Linear Algebra Package. Scientific and Engineering Computation Series. MIT Press, Cambridge (1997), Project Web page: http://www.cs.utexas.edu/users/plapack/
Google Scholar
Herrmann, C.A.: The Skeleton-Based Parallelization of Divide-and-Conquer Recursions. PhD thesis, Fakultät für Mathematik und Informatik, Universität Passau, Logos-Verlag (2001)
Google Scholar
Herrmann, C.A., Lengauer, C.: HDC: A higher-order language for divide-andconquer. Parallel Processing Letters 10, 239–250 (2000)
Article Google Scholar
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers – Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)
Google Scholar
Moreira, J.E., Midkiff, S.P., Gupta, M.: Supporting multidimensional arrays in Java. Concurrency and Computation – Practice & Experience 13, 317–340 (2003)
Article Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices 33, 212–223 (1998); Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 1998), Project Web page: http://supertech.lcs.mit.edu/cilk/
Article Google Scholar
Trinder, P.W., Hammond, K., Loidl, H.W., Peyton Jones, S.L.: Algorithm + strategy = parallelism. J. Functional Programming 8, 23–60 (1998), Project Web page: http://www.cee.hw.ac.uk/dsg/gph/
Article MATH MathSciNet Google Scholar
Philippsen, M., Zenger, M.: JavaParty – transparent remote objects in Java. Concurrency: Practice and Experience 9, 1225–1242 (1997), Project Web page: http://www.ipd.uka.de/JavaParty/
Article Google Scholar
Koelbel, C.H., Loveman, D.B., Schreiber, R.S., Steele Jr., G.L., Zosel, M.E.: The High Performance Fortran Handbook. Scientific and Engineering Computation. MIT Press, Cambridge (1994)
Google Scholar
Foster, I.: Designing and Building Parallel Programs. Addison-Wesley, Reading (1995)
MATH Google Scholar
Brandes, T., Zimmermann, F.: ADAPTOR—a transformation tool for HPF programs. In: Decker, K.M., Rehmann, R.M. (eds.) Programming Environments for Massively Distributed Systems, pp. 91–96. Birkhäuser, Basel (1994)
Google Scholar
Dagum, L., Menon, R.: OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science & Engineering 5, 46–55 (1998), Project Web page: http://www.openmp.org/
Article Google Scholar
Lengauer, C.: Loop parallelization in the polytope model. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 398–416. Springer, Heidelberg (1993)
Google Scholar
Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model. LNCS, vol. 1132, pp. 79–103. Springer, Heidelberg (1996)
Google Scholar
Andonov, R., Balev, S., Rajopadhye, S., Yanev, N.: Optimal semi-oblique tiling. In: Proc.13th Ann. ACM Symp.on Parallel Algorithms and Architectures (SPAA 2001). ACM Press, New York (2001)
Google Scholar
Griebl, M., Faber, P., Lengauer, C.: Space-time mapping and tiling – a helpful combination. Concurrency and Computation: Practice and Experience 16, 221–246 (2004); Proc. 9th Workshop on Compilers for Parallel Computers (CPC 2001)
Article Google Scholar
Quilleré, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Programming 28, 469–498 (2000)
Article Google Scholar
Bastoul, C.: Generating loops for scanning polyhedra. Technical Report 2002/23, PRiSM, Versailles University (2002), Project Web page: http://www.prism.uvsq.fr/~cedb/bastools/cloog.html
Griebl, M., Lengauer, C.: The loop parallelizer LooPo. In: Gerndt, M. (ed.) Proc. Sixth Workshop on Compilers for Parallel Computers (CPC 1996), Konferenzen des Forschungszentrums Jülich 21, Forschungszentrum Jülich, pp. 311–320 (1996), Project Web page: http://www.infosun.fmi.uni-passau.de/cl/loopo/
Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Programming 21, 313–348 (1992)
Article MATH MathSciNet Google Scholar
Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Programming 21, 389–420 (1992)
Article MATH MathSciNet Google Scholar
Feautrier, P.: Toward automatic distribution. Parallel Processing Letters 4, 233–244 (1994)
Article Google Scholar
Dion, M., Robert, Y.: Mapping affine loop nests: New results. In: Hertzberger, B., Serazzi, G. (eds.) HPCN-Europe 1995. LNCS, vol. 919, pp. 184–189. Springer, Heidelberg (1995)
Chapter Google Scholar
Guyer, S.Z., Lin, C.: Optimizing the use of high-performance software libraries. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 227–243. Springer, Heidelberg (2001)
Chapter Google Scholar
Czarnecki, K., Eisenecker, U., Glück, R., Vandevoorde, D., Veldhuizen, T.: Generative programming and active libraries (extended abstract). In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, pp. 25–39. Springer, Heidelberg (2000)
Chapter Google Scholar
Hoare, C.A.R.: Communicating Sequential Processes. Series in Computer Science. Prentice-Hall Int., Englewood Cliffs (1985)
MATH Google Scholar
Herrmann, C.A., Lengauer, C.: Using metaprogramming to parallelize functional specifications. Parallel Processing Letters 12, 193–210 (2002)
Article Google Scholar
Taha, W.: A gentle introduction to multi-stage programming. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 30–50. Springer, Heidelberg (2004)
Chapter Google Scholar
Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Mellor-Crummey, J., Torczon, L.: Telescoping languages: A strategy for automatic generation of scientific problem solving systems from annotated libraries. J. Parallel and Distributed Computing 61, 1803–1826 (2001)
Article MATH Google Scholar
Beckmann, O., Houghton, A., Mellor, M., Kelly, P.: Run-time code generation in C++ as a foundation for domain-specific optimisation. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 291–306. Springer, Heidelberg (2004)
Chapter Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 3–35 (2001), Project Web page: http://math-atlas.sourceforge.net/
Article MATH Google Scholar
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1998), vol. 3, pp. 1381–1384 (1998), Project Web page: http://www.fftw.org/
Püschel, M., Singer, B., Xiong, J., Moura, J.F.F., Johnson, J., Padua, D., Veloso, M., Johnson, R.W.: SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. High Performance in Computing and Applications (2003) (to appear), Project Web page: http://www.ece.cmu.edu/~spiral/
Frigo, M.: A fast Fourier transform compiler. ACM SIGPLAN Notices 34, 169–180 (1999); Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 1999)
Article Google Scholar
Aldinucci, M., Gorlatch, S., Lengauer, C., Pelagatti, S.: Towards parallel programming by transformation: The FAN skeleton framework. Parallel Algorithms and Applications 16, 87–121 (2001)
MATH MathSciNet Google Scholar
Kuchen, H., Cole, M.: The integration of task and data parallel skeletons. Parallel Processing Letters 12, 141–155 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Mathematik und Informatik, Universität Passau, D-94030, Passau, Germany
Christian Lengauer

Authors

Christian Lengauer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Mathematics, University of Passau,
Christian Lengauer
University of Texas at Austin, Austin, Texas, USA
Don Batory
INRIA/LaBRI, Domaine universitaire, 351, cours de la Libération, F-33402, Talence Cedex
Charles Consel
EPFL, 1015, Lausanne, Switzerland
Martin Odersky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lengauer, C. (2004). Program Optimization in the Domain of High-Performance Parallelism. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds) Domain-Specific Program Generation. Lecture Notes in Computer Science, vol 3016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25935-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-25935-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22119-7
Online ISBN: 978-3-540-25935-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics