ABSTRACT
Parallel application performance models provide valuable insight about the performance in real systems. Capable tools providing fast, accurate, and comprehensive prediction and evaluation of high-performance computing (HPC) applications and system architectures have important value. This paper presents PyPassT, an analysis based modeling framework built on static program analysis and integrated simulation of the target HPC architectures. More specifically, the framework analyzes application source code written in C with OpenACC directives and transforms it into an application model describing its computation and communication behavior (including CPU and GPU workloads, memory accesses, and message-passing transactions). The application model is then executed on a simulated HPC architecture for performance analysis. Preliminary experiments demonstrate that the proposed framework can represent the runtime behavior of benchmark applications with good accuracy.
- David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten Von Eicken. LogP: Towards a realistic model of parallel computation, volume 28. ACM, 1993. Google ScholarDigital Library
- Albert Alexandrov, Mihai F Ionescu, Klaus E Schauser, and Chris Scheiman. Loggp: incorporating long messages into the logp model-one step closer towards a realistic model for parallel computation. In Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, pages 95--105. ACM, 1995. Google ScholarDigital Library
- Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. Loggopsim: simulating large-scale applications in the loggops model. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 597--604. ACM, 2010. Google ScholarDigital Library
- A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. The structural simulation toolkit. SIGMETRICS Perform. Eval. Rev., 38(4):37--42, 2011. Google ScholarDigital Library
- Arun Rodrigues, Elliot Cooper-Balis, Keren Bergman, Kurt Ferreira, David Bunde, and K. Scott Hemmert. Improvements to the structural simulation toolkit. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTools), pages 190--195, 2012. Google ScholarDigital Library
- Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, November 2005. Google ScholarDigital Library
- Curtis L Janssen, Helgi Adalsteinsson, Scott Cranford, Joseph P Kenny, Ali Pinar, David A Evensky, and Jackson Mayo. A simulator for large-scale parallel computer architectures. Technology Integration Advancements in Distributed Systems and Computing, 179:57--73, 2012. Google ScholarDigital Library
- Xing Wu and Frank Mueller. Scalaextrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 113--122, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- Amir Bahmani and Frank Mueller. Scalable communication event tracing via clustering. Journal of Parallel and Distributed Computing, 109:230--244, 2017.Google ScholarCross Ref
- Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, and Weimin Zheng. Fact: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 27:1--27:12, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Dan Quinlan. Rose: Compiler support for object-oriented frameworks. Parallel Processing Letters, 10(02n03):215--226, 2000.Google ScholarCross Ref
- C. Dave, H. Bae, S. J. Min, S. Lee, R. Eigenmann, and S. Midkiff. Cetus: A source-to-source compiler infrastructure for multicores. Computer, 42(12):36--42, Dec 2009. Google ScholarDigital Library
- Seyong Lee, Jeremy S. Meredith, and Jeffrey S. Vetter. Compass: A framework for automated performance modeling and prediction. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pages 405--414, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Cy Chan, Didem Unat, Michael Lijewski, Weiqun Zhang, John Bell, and John Shalf. Software design space exploration for exascale combustion co-design. In Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer, editors, Supercomputing, pages 196--212, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.Google ScholarCross Ref
- A. Bhattacharyya and T. Hoefler. Pemogen: Automatic adaptive performance modeling during program runtime. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 393--404, Aug 2014. Google ScholarDigital Library
- G. Chennupati, N. Santhi, S. Eidenbenz, and S. Thulasidasan. An analytical memory hierarchy model for performance prediction. In 2017 Winter Simulation Conference (WSC), pages 908--919. IEEE, 2017.Google ScholarCross Ref
- R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970. Google ScholarDigital Library
- Openacc. openacc: Directives for accelerators., January 2018.Google Scholar
- Mohammad Obaida and Jason Liu. Simulation of hpc job scheduling and large-scale parallel workloads. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), 2017.Google ScholarDigital Library
- Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, and Laxmikant V. Kale. Evaluating hpc networks via simulation of parallel workloads. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, pages 14:1--14:12, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarDigital Library
- Kyle L Spafford and Jeffrey S Vetter. Aspen: a domain specific language for performance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 84, 2012. Google ScholarDigital Library
- Nathan R Tallent and Adolfy Hoisie. Palm: easing the burden of analytical performance modeling. In Proceedings of the 28th ACM international conference on Supercomputing, pages 221--230, 2014. Google ScholarDigital Library
- Christopher D. Carothers, Jeremy S. Meredith, Mark P. Blanco, Jeffrey S. Vetter, Misbah Mubarak, Justin LaPre, and Shirley Moore. Durango: Scalable synthetic workload generation for extreme-scale application performance modeling and simulation. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM-PADS '17, pages 97--108, New York, NY, USA, 2017. ACM. Google ScholarDigital Library
- M. Mubarak, C. D. Carothers, R. B. Ross, and P. Carns. Enabling parallel simulation of large-scale hpc network systems. IEEE Transactions on Parallel and Distributed Systems, 28(1):87--100, Jan 2017. Google ScholarDigital Library
- Gengbin Zheng, Gunavardhan Kakulapati, and Laxmikant V Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 78, 2004.Google Scholar
- Christopher D Carothers, David Bauer, and Shawn Pearce. ROSS: A high-performance, low-memory, modular Time Warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarCross Ref
- Gopinath Chennupati, Nanadakishore Santhi, Stephen Eidenbenz, Robert Joseph Zerr, Massimiliano Rosa, Richard James Zamora, Eun Jung Park, Balasubramanya T. Nadiga, Jason Liu, Kishwar Ahmed, and Mohammad Abu Obaida. Performance prediction toolkit. https://github.com/lanl/PPT, 2017.Google Scholar
- Kishwar Ahmed, Mohammad Obaida, Jason Liu, Stephan Eidenbenz, Nandakishore Santhi, and Guillaume Chapuis. An integrated interconnection network model for large-scale performance prediction. In Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS), pages 177--187, 2016. Google ScholarDigital Library
- Kishwar Ahmed, Jason Liu, Stephan Eidenbenz, and Joe Zerr. Scalable interconnection network models for rapid performance prediction of hpc applications. In Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications (HPCC), pages 1069--1078, 2016.Google ScholarCross Ref
- G. Chennupati, N. Santhi, R. Bird, S. Thulasidasan, A. H. A. Badawy, S. Misra, and S. Eidenbenz. A scalable analytical memory model for cpu performance prediction. In Stephen Jarvis, Steven Wright, and Simon Hammond, editors, Proceedings of the 8th International Workshop on High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, PMBS@SC, pages 114--135. Springer, 2017.Google Scholar
- G. Chapuis, S. Eidenbenz, and N. Santhi. Gpu performance prediction through parallel discrete event simulation and common sense. In Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, 2015. Google ScholarDigital Library
- Nandakishore Santhi, Stephan Eidenbenz, and Jason Liu. The simian concept: Parallel discrete event simulation with interpreted languages and just-in-time compilation. In L. Yilmaz, W. K V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, editors, Proceedings of the 2015 Winter Simulation Conference, pages 3013--3024, Piscataway, New Jersey, 2015. Institute of Electrical and Electronics Engineers, Inc. Google ScholarDigital Library
- Seyong Lee and Jeffrey S. Vetter. Openarc: Extensible openacc compiler framework for directive-based accelerator programming study. In Proceedings of the First Workshop on Accelerator Programming Using Directives, WACCPD '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarDigital Library
- Vikram S Adve, Rajive Bagrodia, Ewa Deelman, and Rizos Sakellariou. Compiler-optimized simulation of large-scale applications on high performance architectures. Journal of Parallel and Distributed Computing, 62(3):393 -- 426, 2002. Google ScholarDigital Library
- Scott Pakin and Patrick McCormick. Hardware-independent application characterization. In International Symposium on Workload Characterization (IISWC), pages 111--112, Portland, Oregon, USA, 2013. IEEE.Google ScholarCross Ref
- E. Berg and E. Hagersten. StatCache: a probabilistic approach to efficient and accurate data locality analysis. In IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004, pages 20--27, 2004. Google ScholarDigital Library
- Ptx: Nvidia parallel thread execution, December 2017.Google Scholar
- B. Kalla, N. Santhi, A. H. A. Badawy, G. Chennupati, and S. Eidenbenz. A probabilistic monte carlo framework for branch prediction. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2017.Google ScholarCross Ref
Index Terms
- Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations
Recommendations
Modeling and predicting performance of high performance computing applications on hardware accelerators
Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumComputers with hardware accelerators, also referred to as hybrid-core systems, speedup applications by offloading certain compute operations that can run faster on accelerators. Thus, it is not surprising that many of top500 supercomputers use ...
A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs
This paper presents unique modeling algorithms of performance prediction for sparse matrix-vector multiplication on GPUs. Based on the algorithms, we develop a framework that is able to predict SpMV kernel performance and to analyze the reported ...
Comments