Skip to main content

Program Optimization in the Domain of High-Performance Parallelism

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3016))

Abstract

I consider the problem of the domain-specific optimization of programs. I review different approaches, discuss their potential, and sketch instances of them from the practice of high-performance parallelism. Readers need not be familiar with high-performance computing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Réveillère, L., Mérillon, F., Consel, C., Marlet, R., Muller, G.: A DSL approach to improve productivity and safety in device drivers development. In: Proc. Fifteenth IEEE Int. Conf. on Automated Software Engineering (ASE 2000), pp. 91–100. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  2. van Deursen, A., Klint, P., Visser, J.: Domain-specific languages: An annotated bibliography. ACM SIGPLAN Notices 35, 26–36 (2000)

    Article  Google Scholar 

  3. Hammond, K., Michaelson, G.: The design of hume: A high-level language for the real-time embedded systems domain. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 127–142. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Quinn, M.J.: Parallel Computing. McGraw-Hill, New York (1994)

    Google Scholar 

  5. Robison, A.D.: Impact of economics on compiler optimization. In: Proc. ACM 2001 Java Grande/ISCOPE Conf., pp. 1–10. ACM Press, New York (2001)

    Chapter  Google Scholar 

  6. Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)

    MATH  Google Scholar 

  7. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM Parallel Virtual Machine, A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994), Project Web page: http://www.csm.ornl.gov/pvm/pvm_home.html

  8. Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and answers about BSP. Scientific Programming 6, 249–274 (1997), Project Web page: http://www.bsp-worldwide.org/

    Google Scholar 

  9. Gorlatch, S.: Message passing without send-receive. Future Generation Computer Systems 18, 797–805 (2002)

    Article  MATH  Google Scholar 

  10. Gorlatch, S.: Toward formally-based design of message passing programs. IEEE Transactions on Software Engineering 26, 276–288 (2000)

    Article  Google Scholar 

  11. Gorlatch, S.: Optimizing compositions of components in parallel and distributed programming. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 274–290. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Kuchen, H.: Optimizing sequences of skeleton calls. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 254–273. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Bischof, H., Gorlatch, S., Leshchinskiy, R.: Generic parallel programming using C++ templates and skeletons. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 107–126. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A linear algebra library for message-passing computers. In: Proc. Eighth SIAM Conf. on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, vol. 15 (1997), (electronic) Project Web page: http://www.netlib.org/scalapack/

  15. van de Geijn, R.: Using PLAPACK: Parallel Linear Algebra Package. Scientific and Engineering Computation Series. MIT Press, Cambridge (1997), Project Web page: http://www.cs.utexas.edu/users/plapack/

    Google Scholar 

  16. Herrmann, C.A.: The Skeleton-Based Parallelization of Divide-and-Conquer Recursions. PhD thesis, Fakultät für Mathematik und Informatik, Universität Passau, Logos-Verlag (2001)

    Google Scholar 

  17. Herrmann, C.A., Lengauer, C.: HDC: A higher-order language for divide-andconquer. Parallel Processing Letters 10, 239–250 (2000)

    Article  Google Scholar 

  18. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers – Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)

    Google Scholar 

  19. Moreira, J.E., Midkiff, S.P., Gupta, M.: Supporting multidimensional arrays in Java. Concurrency and Computation – Practice & Experience 13, 317–340 (2003)

    Article  Google Scholar 

  20. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices 33, 212–223 (1998); Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 1998), Project Web page: http://supertech.lcs.mit.edu/cilk/

    Article  Google Scholar 

  21. Trinder, P.W., Hammond, K., Loidl, H.W., Peyton Jones, S.L.: Algorithm + strategy = parallelism. J. Functional Programming 8, 23–60 (1998), Project Web page: http://www.cee.hw.ac.uk/dsg/gph/

    Article  MATH  MathSciNet  Google Scholar 

  22. Philippsen, M., Zenger, M.: JavaParty – transparent remote objects in Java. Concurrency: Practice and Experience 9, 1225–1242 (1997), Project Web page: http://www.ipd.uka.de/JavaParty/

    Article  Google Scholar 

  23. Koelbel, C.H., Loveman, D.B., Schreiber, R.S., Steele Jr., G.L., Zosel, M.E.: The High Performance Fortran Handbook. Scientific and Engineering Computation. MIT Press, Cambridge (1994)

    Google Scholar 

  24. Foster, I.: Designing and Building Parallel Programs. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  25. Brandes, T., Zimmermann, F.: ADAPTOR—a transformation tool for HPF programs. In: Decker, K.M., Rehmann, R.M. (eds.) Programming Environments for Massively Distributed Systems, pp. 91–96. Birkhäuser, Basel (1994)

    Google Scholar 

  26. Dagum, L., Menon, R.: OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science & Engineering 5, 46–55 (1998), Project Web page: http://www.openmp.org/

    Article  Google Scholar 

  27. Lengauer, C.: Loop parallelization in the polytope model. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 398–416. Springer, Heidelberg (1993)

    Google Scholar 

  28. Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model. LNCS, vol. 1132, pp. 79–103. Springer, Heidelberg (1996)

    Google Scholar 

  29. Andonov, R., Balev, S., Rajopadhye, S., Yanev, N.: Optimal semi-oblique tiling. In: Proc.13th Ann. ACM Symp.on Parallel Algorithms and Architectures (SPAA 2001). ACM Press, New York (2001)

    Google Scholar 

  30. Griebl, M., Faber, P., Lengauer, C.: Space-time mapping and tiling – a helpful combination. Concurrency and Computation: Practice and Experience 16, 221–246 (2004); Proc. 9th Workshop on Compilers for Parallel Computers (CPC 2001)

    Article  Google Scholar 

  31. Quilleré, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Programming 28, 469–498 (2000)

    Article  Google Scholar 

  32. Bastoul, C.: Generating loops for scanning polyhedra. Technical Report 2002/23, PRiSM, Versailles University (2002), Project Web page: http://www.prism.uvsq.fr/~cedb/bastools/cloog.html

  33. Griebl, M., Lengauer, C.: The loop parallelizer LooPo. In: Gerndt, M. (ed.) Proc. Sixth Workshop on Compilers for Parallel Computers (CPC 1996), Konferenzen des Forschungszentrums Jülich 21, Forschungszentrum Jülich, pp. 311–320 (1996), Project Web page: http://www.infosun.fmi.uni-passau.de/cl/loopo/

  34. Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Programming 21, 313–348 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  35. Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Programming 21, 389–420 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  36. Feautrier, P.: Toward automatic distribution. Parallel Processing Letters 4, 233–244 (1994)

    Article  Google Scholar 

  37. Dion, M., Robert, Y.: Mapping affine loop nests: New results. In: Hertzberger, B., Serazzi, G. (eds.) HPCN-Europe 1995. LNCS, vol. 919, pp. 184–189. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  38. Guyer, S.Z., Lin, C.: Optimizing the use of high-performance software libraries. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 227–243. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  39. Czarnecki, K., Eisenecker, U., Glück, R., Vandevoorde, D., Veldhuizen, T.: Generative programming and active libraries (extended abstract). In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, pp. 25–39. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  40. Hoare, C.A.R.: Communicating Sequential Processes. Series in Computer Science. Prentice-Hall Int., Englewood Cliffs (1985)

    MATH  Google Scholar 

  41. Herrmann, C.A., Lengauer, C.: Using metaprogramming to parallelize functional specifications. Parallel Processing Letters 12, 193–210 (2002)

    Article  Google Scholar 

  42. Taha, W.: A gentle introduction to multi-stage programming. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 30–50. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  43. Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Mellor-Crummey, J., Torczon, L.: Telescoping languages: A strategy for automatic generation of scientific problem solving systems from annotated libraries. J. Parallel and Distributed Computing 61, 1803–1826 (2001)

    Article  MATH  Google Scholar 

  44. Beckmann, O., Houghton, A., Mellor, M., Kelly, P.: Run-time code generation in C++ as a foundation for domain-specific optimisation. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 291–306. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  45. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 3–35 (2001), Project Web page: http://math-atlas.sourceforge.net/

    Article  MATH  Google Scholar 

  46. Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1998), vol. 3, pp. 1381–1384 (1998), Project Web page: http://www.fftw.org/

  47. Püschel, M., Singer, B., Xiong, J., Moura, J.F.F., Johnson, J., Padua, D., Veloso, M., Johnson, R.W.: SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. High Performance in Computing and Applications (2003) (to appear), Project Web page: http://www.ece.cmu.edu/~spiral/

  48. Frigo, M.: A fast Fourier transform compiler. ACM SIGPLAN Notices 34, 169–180 (1999); Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 1999)

    Article  Google Scholar 

  49. Aldinucci, M., Gorlatch, S., Lengauer, C., Pelagatti, S.: Towards parallel programming by transformation: The FAN skeleton framework. Parallel Algorithms and Applications 16, 87–121 (2001)

    MATH  MathSciNet  Google Scholar 

  50. Kuchen, H., Cole, M.: The integration of task and data parallel skeletons. Parallel Processing Letters 12, 141–155 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lengauer, C. (2004). Program Optimization in the Domain of High-Performance Parallelism. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds) Domain-Specific Program Generation. Lecture Notes in Computer Science, vol 3016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25935-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25935-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22119-7

  • Online ISBN: 978-3-540-25935-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics