Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations

Authors:
Mohammad Abu Obaida

Florida International University, Miami, FL, USA

Florida International University, Miami, FL, USA
View Profile

,
Jason Liu

Florida International University, Miami, FL, USA

Florida International University, Miami, FL, USA
View Profile

,
Gopinath Chennupati

Los Alamos National Laboratory, Los Alamos, NM, USA

Los Alamos National Laboratory, Los Alamos, NM, USA
View Profile

,
Nandakishore Santhi

Los Alamos National Laboratory, Los Alamos, NM, USA

Los Alamos National Laboratory, Los Alamos, NM, USA
View Profile

,
Stephan Eidenbenz

Los Alamos National Laboratory, Los Alamos, NM, USA

Los Alamos National Laboratory, Los Alamos, NM, USA
View Profile

SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationMay 2018Pages 49–59https://doi.org/10.1145/3200921.3200937

Published:14 May 2018Publication History

SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Pages 49–59

ABSTRACT

Parallel application performance models provide valuable insight about the performance in real systems. Capable tools providing fast, accurate, and comprehensive prediction and evaluation of high-performance computing (HPC) applications and system architectures have important value. This paper presents PyPassT, an analysis based modeling framework built on static program analysis and integrated simulation of the target HPC architectures. More specifically, the framework analyzes application source code written in C with OpenACC directives and transforms it into an application model describing its computation and communication behavior (including CPU and GPU workloads, memory accesses, and message-passing transactions). The application model is then executed on a simulated HPC architecture for performance analysis. Preliminary experiments demonstrate that the proposed framework can represent the runtime behavior of benchmark applications with good accuracy.

References

David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten Von Eicken. LogP: Towards a realistic model of parallel computation, volume 28. ACM, 1993. Google ScholarDigital Library
Albert Alexandrov, Mihai F Ionescu, Klaus E Schauser, and Chris Scheiman. Loggp: incorporating long messages into the logp model-one step closer towards a realistic model for parallel computation. In Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, pages 95--105. ACM, 1995. Google ScholarDigital Library
Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. Loggopsim: simulating large-scale applications in the loggops model. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 597--604. ACM, 2010. Google ScholarDigital Library
A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. The structural simulation toolkit. SIGMETRICS Perform. Eval. Rev., 38(4):37--42, 2011. Google ScholarDigital Library
Arun Rodrigues, Elliot Cooper-Balis, Keren Bergman, Kurt Ferreira, David Bunde, and K. Scott Hemmert. Improvements to the structural simulation toolkit. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTools), pages 190--195, 2012. Google ScholarDigital Library
Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, November 2005. Google ScholarDigital Library
Curtis L Janssen, Helgi Adalsteinsson, Scott Cranford, Joseph P Kenny, Ali Pinar, David A Evensky, and Jackson Mayo. A simulator for large-scale parallel computer architectures. Technology Integration Advancements in Distributed Systems and Computing, 179:57--73, 2012. Google ScholarDigital Library
Xing Wu and Frank Mueller. Scalaextrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 113--122, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Amir Bahmani and Frank Mueller. Scalable communication event tracing via clustering. Journal of Parallel and Distributed Computing, 109:230--244, 2017.Google ScholarCross Ref
Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, and Weimin Zheng. Fact: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 27:1--27:12, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Dan Quinlan. Rose: Compiler support for object-oriented frameworks. Parallel Processing Letters, 10(02n03):215--226, 2000.Google ScholarCross Ref
C. Dave, H. Bae, S. J. Min, S. Lee, R. Eigenmann, and S. Midkiff. Cetus: A source-to-source compiler infrastructure for multicores. Computer, 42(12):36--42, Dec 2009. Google ScholarDigital Library
Seyong Lee, Jeremy S. Meredith, and Jeffrey S. Vetter. Compass: A framework for automated performance modeling and prediction. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pages 405--414, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
Cy Chan, Didem Unat, Michael Lijewski, Weiqun Zhang, John Bell, and John Shalf. Software design space exploration for exascale combustion co-design. In Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer, editors, Supercomputing, pages 196--212, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.Google ScholarCross Ref
A. Bhattacharyya and T. Hoefler. Pemogen: Automatic adaptive performance modeling during program runtime. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 393--404, Aug 2014. Google ScholarDigital Library
G. Chennupati, N. Santhi, S. Eidenbenz, and S. Thulasidasan. An analytical memory hierarchy model for performance prediction. In 2017 Winter Simulation Conference (WSC), pages 908--919. IEEE, 2017.Google ScholarCross Ref
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970. Google ScholarDigital Library
Openacc. openacc: Directives for accelerators., January 2018.Google Scholar
Mohammad Obaida and Jason Liu. Simulation of hpc job scheduling and large-scale parallel workloads. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), 2017.Google ScholarDigital Library
Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, and Laxmikant V. Kale. Evaluating hpc networks via simulation of parallel workloads. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, pages 14:1--14:12, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarDigital Library
Kyle L Spafford and Jeffrey S Vetter. Aspen: a domain specific language for performance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 84, 2012. Google ScholarDigital Library
Nathan R Tallent and Adolfy Hoisie. Palm: easing the burden of analytical performance modeling. In Proceedings of the 28th ACM international conference on Supercomputing, pages 221--230, 2014. Google ScholarDigital Library
Christopher D. Carothers, Jeremy S. Meredith, Mark P. Blanco, Jeffrey S. Vetter, Misbah Mubarak, Justin LaPre, and Shirley Moore. Durango: Scalable synthetic workload generation for extreme-scale application performance modeling and simulation. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM-PADS '17, pages 97--108, New York, NY, USA, 2017. ACM. Google ScholarDigital Library
M. Mubarak, C. D. Carothers, R. B. Ross, and P. Carns. Enabling parallel simulation of large-scale hpc network systems. IEEE Transactions on Parallel and Distributed Systems, 28(1):87--100, Jan 2017. Google ScholarDigital Library
Gengbin Zheng, Gunavardhan Kakulapati, and Laxmikant V Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 78, 2004.Google Scholar
Christopher D Carothers, David Bauer, and Shawn Pearce. ROSS: A high-performance, low-memory, modular Time Warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarCross Ref
Gopinath Chennupati, Nanadakishore Santhi, Stephen Eidenbenz, Robert Joseph Zerr, Massimiliano Rosa, Richard James Zamora, Eun Jung Park, Balasubramanya T. Nadiga, Jason Liu, Kishwar Ahmed, and Mohammad Abu Obaida. Performance prediction toolkit. https://github.com/lanl/PPT, 2017.Google Scholar
Kishwar Ahmed, Mohammad Obaida, Jason Liu, Stephan Eidenbenz, Nandakishore Santhi, and Guillaume Chapuis. An integrated interconnection network model for large-scale performance prediction. In Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS), pages 177--187, 2016. Google ScholarDigital Library
Kishwar Ahmed, Jason Liu, Stephan Eidenbenz, and Joe Zerr. Scalable interconnection network models for rapid performance prediction of hpc applications. In Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications (HPCC), pages 1069--1078, 2016.Google ScholarCross Ref
G. Chennupati, N. Santhi, R. Bird, S. Thulasidasan, A. H. A. Badawy, S. Misra, and S. Eidenbenz. A scalable analytical memory model for cpu performance prediction. In Stephen Jarvis, Steven Wright, and Simon Hammond, editors, Proceedings of the 8th International Workshop on High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, PMBS@SC, pages 114--135. Springer, 2017.Google Scholar
G. Chapuis, S. Eidenbenz, and N. Santhi. Gpu performance prediction through parallel discrete event simulation and common sense. In Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, 2015. Google ScholarDigital Library
Nandakishore Santhi, Stephan Eidenbenz, and Jason Liu. The simian concept: Parallel discrete event simulation with interpreted languages and just-in-time compilation. In L. Yilmaz, W. K V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, editors, Proceedings of the 2015 Winter Simulation Conference, pages 3013--3024, Piscataway, New Jersey, 2015. Institute of Electrical and Electronics Engineers, Inc. Google ScholarDigital Library
Seyong Lee and Jeffrey S. Vetter. Openarc: Extensible openacc compiler framework for directive-based accelerator programming study. In Proceedings of the First Workshop on Accelerator Programming Using Directives, WACCPD '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarDigital Library
Vikram S Adve, Rajive Bagrodia, Ewa Deelman, and Rizos Sakellariou. Compiler-optimized simulation of large-scale applications on high performance architectures. Journal of Parallel and Distributed Computing, 62(3):393 -- 426, 2002. Google ScholarDigital Library
Scott Pakin and Patrick McCormick. Hardware-independent application characterization. In International Symposium on Workload Characterization (IISWC), pages 111--112, Portland, Oregon, USA, 2013. IEEE.Google ScholarCross Ref
E. Berg and E. Hagersten. StatCache: a probabilistic approach to efficient and accurate data locality analysis. In IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004, pages 20--27, 2004. Google ScholarDigital Library
Ptx: Nvidia parallel thread execution, December 2017.Google Scholar
B. Kalla, N. Santhi, A. H. A. Badawy, G. Chennupati, and S. Eidenbenz. A probabilistic monte carlo framework for branch prediction. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2017.Google ScholarCross Ref

Index Terms

Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
Read More
Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Computers with hardware accelerators, also referred to as hybrid-core systems, speedup applications by offloading certain compute operations that can run faster on accelerators. Thus, it is not surprising that many of top500 supercomputers use ...
Read More
A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs

This paper presents unique modeling algorithms of performance prediction for sparse matrix-vector multiplication on GPUs. Based on the algorithms, we develop a framework that is able to predict SpMV kernel performance and to analyze the reported ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
May 2018
224 pages
ISBN:9781450350921
DOI:10.1145/3200921
General Chairs:
Francesco Quaglia
University of Rome "Tor Vergata", Italy
,
Alessandro Pellegrini
Sapienza University of Rome, Italy
,
Program Chair:
Georgios K. Theodoropoulos
Southern University of Science and Technology, China
Copyright © 2018 ACM
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
high-performance computing
performance modeling
performance prediction
program analysis
simulation
Qualifiers
- research-article
Conference

Acceptance Rates
SIGSIM-PADS '18 Paper Acceptance Rate15of46submissions,33%Overall Acceptance Rate398of779submissions,51%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 701
  Total Downloads
- Downloads (Last 12 months)116
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations

SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators

A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs