Skip to main content

Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8488))

Abstract

High performance systems have complex, diverse and rapidly evolving architectures. The span of applications, workloads, and resource use patterns is rapidly diversifying. Adapting applications for efficient execution on this spectrum of execution environments is effort intensive. There are many performance optimization tools which implement some or several aspects of the full performance optimization task but almost none are comprehensive across architectures, environments, applications, and workloads. This paper presents, illustrates, and applies a modular infrastructure which enables composition of multiple open-source tools and analyses into a set of workflows implementing comprehensive end-to-end optimization of a diverse spectrum of HPC applications on multiple architectures and for multiple resource types and parallel environments. It gives results from an implementation on the Stampede HPC system at the Texas Advanced Computing Center where a user can submit an application for optimization using only a single command line and get back an at least, partially optimized program without manual program modification for two different chips. Currently, only a subset of the possible optimizations is completely automated but this subset is rapidly growing. Case studies of applications of the workflow are presented. The implementations currently available for download as the PerfExpert tool version 4.0 supports both Sandy Bridge and Intel Phi chips.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alonso, P., Badia, R.M., Labarta, J., Barreda, M., Dolz, M.F., Mayo, R., Quintana-Orti, E.S., Reyes, R.: Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In: Proceedings of the International Conference on Parallel Processing, pp. 420–429 (2012)

    Google Scholar 

  2. Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A.: Automatic Program Parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)

    Article  Google Scholar 

  3. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. International Journal of High Performance Computing Applications 14(3), 189–204 (2000)

    Article  Google Scholar 

  4. Burtscher, M., Kim, B.-D., Diamond, J., McCalpin, J.D., Koesterke, L., Browne, J.: PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010)

    Google Scholar 

  5. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: A Benchmark Suite for Heterogeneous Computing. In: Proceedings of the IEEE International Symposium on Workload Characterization, pp. 44–54 (2009)

    Google Scholar 

  6. Chung, I.H., Cong, G., Klepacki, D., Sbaraglia, S., Seelam, S., Wen, H.F.: A Framework for Automated Performance Bottleneck Detection. In: Proceedings of the IEEE International Symposium on Parallel and Distributed processing (2008)

    Google Scholar 

  7. Eigenmann, R.: Toward a Methodology of Optimizing Programs for High-Performance Computers. In: Proceedings of the International Conference on Supercomputing, pp. 27–36 (1993)

    Google Scholar 

  8. Huck, K.A., Malony, A.D., Shende, S., Morris, A.: Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0. Large-Scale Programming Tools and Environments. Special Issue of Scientific Programming 16(2-3), 123–134 (2008)

    Article  Google Scholar 

  9. Keryell, R., Ancourt, C., Coelho, F., Creusillet, B., Irigoin, F.: PIPS: a Workbench for Building Interprocedural Parallelizers, Compilers and Optimizers. Technical report, École Nationale Supérieure des Mines de Paris (1996)

    Google Scholar 

  10. Kim, S.W., Park, I., Eigenmann, R.: A Performance Advisor Tool for Shared-Memory Parallel Programming. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 274–288. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Klint, P., van der Storm, T., Vinju, J.: RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation. In: Proceedings of the IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 168–177 (2009)

    Google Scholar 

  12. Lattner, C., Adve, V.: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 75–86 (2004)

    Google Scholar 

  13. Llc, B.: Parser Generators. Books LLC. Wiki Series (2010)

    Google Scholar 

  14. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, vol. 40(6), pp. 190–200 (2005)

    Google Scholar 

  15. Mey, D.A., Biersdorf, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Rössel, C., Saviankou, P., Schmidl, D., Shende, S., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Unified Performance Measurement System for Petascale Applications. In: Proceedings of the International Conference on Competence in High Performance Computing, pp. 85–97 (2011)

    Chapter  Google Scholar 

  16. Miceli, R., et al.: AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 328–342. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn Parallel Performance Measurement Tool. IEEE Computer 28(11), 37–46 (1995)

    Article  Google Scholar 

  18. Nethercote, N., Seward, J.: Valgrind: A Program Supervision Framework. Electronic Notes in Theoretical Computer Science 89(2), 44–66 (2003)

    Article  Google Scholar 

  19. Online, http://ft.ornl.gov/doku/cbtfw/

  20. Online, https://www.cs.virginia.edu/~skadron/wiki/rodinia/

  21. Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the Interaction of Tiling and Automatic Parallelization. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005/IWOMP 2006. LNCS, vol. 4315, pp. 24–35. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  22. Park, I., Kapadia, N.H., Figueiredo, R.J., Eigenmann, R., Fortes, J.A.B.: Towards an Integrated, Web-executable Parallel Programming Tool Environment. In: Proceedings of the Supercomputing Conference (2000)

    Google Scholar 

  23. Rane, A., Browne, J.: Enhancing Performance Optimization of Multicore Chips and Multichip Nodes with Data Structure Metrics. In: Proceedings of the Int. Conference on Parallel Architectures and Compilation Techniques, pp. 147–156 (2012)

    Google Scholar 

  24. Reinders, J.: VTune Performance Analyzer Essentials, 1st edn. Intel Press (2005)

    Google Scholar 

  25. Schordan, M., Quinlan, D.: A Source-To-Source Architecture for User-Defined Optimizations. In: Böszörményi, L., Schojer, P. (eds.) JMLC 2003. LNCS, vol. 2789, pp. 214–223. Springer, Heidelberg (2003)

    Google Scholar 

  26. Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W.: Open∣SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming 16(2-3), 105–121 (2008)

    Article  Google Scholar 

  27. Shende, S., Malony, A.D.: The Tau Parallel Performance System. International Journal of High Performance Computing Applications 20(2), 287–311 (2006)

    Article  Google Scholar 

  28. Sopeju, O.A., Burtscher, M., Rane, A., Browne, J.: AutoSCOPE: Automatic Suggestions for Code Optimizations using PerfExpert. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 19–25 (2011)

    Google Scholar 

  29. Tallent, N., Mellor-Crummey, J., Adhianto, L., Fagan, M., Krentel, M.: HPCToolkit: performance tools for scientific computing. Journal of Physics: Conference Series 125(1) (2008)

    Google Scholar 

  30. Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A Scalable Auto-tuning Framework for Compiler Optimization. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing (2009)

    Google Scholar 

  31. Wen, H., Sbaraglia, S., Seelam, S., Chung, I., Cong, G., Klepacki, D.: A Productivity Centered Tools Framework for Application Performance Tuning. In: Proceedings of the International Conference on the Quantitative Evaluation of Systems, pp. 273–274 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Fialho, L., Browne, J. (2014). Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07518-1_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07517-4

  • Online ISBN: 978-3-319-07518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics