Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems

Fialho, Leonardo; Browne, James

doi:10.1007/978-3-319-07518-1_17

Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems

Leonardo Fialho^18,19 &
James Browne²⁰

Conference paper

2689 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8488))

Abstract

High performance systems have complex, diverse and rapidly evolving architectures. The span of applications, workloads, and resource use patterns is rapidly diversifying. Adapting applications for efficient execution on this spectrum of execution environments is effort intensive. There are many performance optimization tools which implement some or several aspects of the full performance optimization task but almost none are comprehensive across architectures, environments, applications, and workloads. This paper presents, illustrates, and applies a modular infrastructure which enables composition of multiple open-source tools and analyses into a set of workflows implementing comprehensive end-to-end optimization of a diverse spectrum of HPC applications on multiple architectures and for multiple resource types and parallel environments. It gives results from an implementation on the Stampede HPC system at the Texas Advanced Computing Center where a user can submit an application for optimization using only a single command line and get back an at least, partially optimized program without manual program modification for two different chips. Currently, only a subset of the possible optimizations is completely automated but this subset is rapidly growing. Case studies of applications of the workflow are presented. The implementations currently available for download as the PerfExpert tool version 4.0 supports both Sandy Bridge and Intel Phi chips.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alonso, P., Badia, R.M., Labarta, J., Barreda, M., Dolz, M.F., Mayo, R., Quintana-Orti, E.S., Reyes, R.: Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In: Proceedings of the International Conference on Parallel Processing, pp. 420–429 (2012)
Google Scholar
Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A.: Automatic Program Parallelization. Proceedings of the IEEE 81(2), 211–243 (1993)
Article Google Scholar
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. International Journal of High Performance Computing Applications 14(3), 189–204 (2000)
Article Google Scholar
Burtscher, M., Kim, B.-D., Diamond, J., McCalpin, J.D., Koesterke, L., Browne, J.: PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: A Benchmark Suite for Heterogeneous Computing. In: Proceedings of the IEEE International Symposium on Workload Characterization, pp. 44–54 (2009)
Google Scholar
Chung, I.H., Cong, G., Klepacki, D., Sbaraglia, S., Seelam, S., Wen, H.F.: A Framework for Automated Performance Bottleneck Detection. In: Proceedings of the IEEE International Symposium on Parallel and Distributed processing (2008)
Google Scholar
Eigenmann, R.: Toward a Methodology of Optimizing Programs for High-Performance Computers. In: Proceedings of the International Conference on Supercomputing, pp. 27–36 (1993)
Google Scholar
Huck, K.A., Malony, A.D., Shende, S., Morris, A.: Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0. Large-Scale Programming Tools and Environments. Special Issue of Scientific Programming 16(2-3), 123–134 (2008)
Article Google Scholar
Keryell, R., Ancourt, C., Coelho, F., Creusillet, B., Irigoin, F.: PIPS: a Workbench for Building Interprocedural Parallelizers, Compilers and Optimizers. Technical report, École Nationale Supérieure des Mines de Paris (1996)
Google Scholar
Kim, S.W., Park, I., Eigenmann, R.: A Performance Advisor Tool for Shared-Memory Parallel Programming. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 274–288. Springer, Heidelberg (2001)
Chapter Google Scholar
Klint, P., van der Storm, T., Vinju, J.: RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation. In: Proceedings of the IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 168–177 (2009)
Google Scholar
Lattner, C., Adve, V.: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the International Symposium on Code Generation and Optimization, pp. 75–86 (2004)
Google Scholar
Llc, B.: Parser Generators. Books LLC. Wiki Series (2010)
Google Scholar
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, vol. 40(6), pp. 190–200 (2005)
Google Scholar
Mey, D.A., Biersdorf, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Rössel, C., Saviankou, P., Schmidl, D., Shende, S., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Unified Performance Measurement System for Petascale Applications. In: Proceedings of the International Conference on Competence in High Performance Computing, pp. 85–97 (2011)
Chapter Google Scholar
Miceli, R., et al.: AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 328–342. Springer, Heidelberg (2013)
Chapter Google Scholar
Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn Parallel Performance Measurement Tool. IEEE Computer 28(11), 37–46 (1995)
Article Google Scholar
Nethercote, N., Seward, J.: Valgrind: A Program Supervision Framework. Electronic Notes in Theoretical Computer Science 89(2), 44–66 (2003)
Article Google Scholar
Online, http://ft.ornl.gov/doku/cbtfw/
Online, https://www.cs.virginia.edu/~skadron/wiki/rodinia/
Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the Interaction of Tiling and Automatic Parallelization. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005/IWOMP 2006. LNCS, vol. 4315, pp. 24–35. Springer, Heidelberg (2008)
Chapter Google Scholar
Park, I., Kapadia, N.H., Figueiredo, R.J., Eigenmann, R., Fortes, J.A.B.: Towards an Integrated, Web-executable Parallel Programming Tool Environment. In: Proceedings of the Supercomputing Conference (2000)
Google Scholar
Rane, A., Browne, J.: Enhancing Performance Optimization of Multicore Chips and Multichip Nodes with Data Structure Metrics. In: Proceedings of the Int. Conference on Parallel Architectures and Compilation Techniques, pp. 147–156 (2012)
Google Scholar
Reinders, J.: VTune Performance Analyzer Essentials, 1st edn. Intel Press (2005)
Google Scholar
Schordan, M., Quinlan, D.: A Source-To-Source Architecture for User-Defined Optimizations. In: Böszörményi, L., Schojer, P. (eds.) JMLC 2003. LNCS, vol. 2789, pp. 214–223. Springer, Heidelberg (2003)
Google Scholar
Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W.: Open∣SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming 16(2-3), 105–121 (2008)
Article Google Scholar
Shende, S., Malony, A.D.: The Tau Parallel Performance System. International Journal of High Performance Computing Applications 20(2), 287–311 (2006)
Article Google Scholar
Sopeju, O.A., Burtscher, M., Rane, A., Browne, J.: AutoSCOPE: Automatic Suggestions for Code Optimizations using PerfExpert. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 19–25 (2011)
Google Scholar
Tallent, N., Mellor-Crummey, J., Adhianto, L., Fagan, M., Krentel, M.: HPCToolkit: performance tools for scientific computing. Journal of Physics: Conference Series 125(1) (2008)
Google Scholar
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A Scalable Auto-tuning Framework for Compiler Optimization. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing (2009)
Google Scholar
Wen, H., Sbaraglia, S., Seelam, S., Chung, I., Cong, G., Klepacki, D.: A Productivity Centered Tools Framework for Application Performance Tuning. In: Proceedings of the International Conference on the Quantitative Evaluation of Systems, pp. 273–274 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX, 78712, USA
Leonardo Fialho
Texas Advanced Computing Center, The University of Texas at Austin, TX, 78712, USA
Leonardo Fialho
Department of Computer Science, The University of Texas at Austin, TX, 78712, USA
James Browne

Authors

Leonardo Fialho
View author publications
You can also search for this author in PubMed Google Scholar
James Browne
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MIN Faculty, Department of Informatics Scientific Computing, University of Hamburg, Bundestraße 45a, 20146, Hamburg, Germany
Julian Martin Kunkel
Deutsches Klimarechenzentrum, Bundesstraße 45a, 20146, Hamburg, Germany
Thomas Ludwig
Germany and Prometeus GmbH, University of Mannheim, Fliederstraße 2, 74915, Waibstadt, Germany
Hans Werner Meuer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fialho, L., Browne, J. (2014). Framework and Modular Infrastructure for Automation of Architectural Adaptation and Performance Optimization for HPC Systems. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-07518-1_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics