skip to main content
10.1145/3200921.3200937acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article
Public Access

Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations

Published:14 May 2018Publication History

ABSTRACT

Parallel application performance models provide valuable insight about the performance in real systems. Capable tools providing fast, accurate, and comprehensive prediction and evaluation of high-performance computing (HPC) applications and system architectures have important value. This paper presents PyPassT, an analysis based modeling framework built on static program analysis and integrated simulation of the target HPC architectures. More specifically, the framework analyzes application source code written in C with OpenACC directives and transforms it into an application model describing its computation and communication behavior (including CPU and GPU workloads, memory accesses, and message-passing transactions). The application model is then executed on a simulated HPC architecture for performance analysis. Preliminary experiments demonstrate that the proposed framework can represent the runtime behavior of benchmark applications with good accuracy.

References

  1. David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten Von Eicken. LogP: Towards a realistic model of parallel computation, volume 28. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Albert Alexandrov, Mihai F Ionescu, Klaus E Schauser, and Chris Scheiman. Loggp: incorporating long messages into the logp model-one step closer towards a realistic model for parallel computation. In Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, pages 95--105. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. Loggopsim: simulating large-scale applications in the loggops model. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 597--604. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. The structural simulation toolkit. SIGMETRICS Perform. Eval. Rev., 38(4):37--42, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arun Rodrigues, Elliot Cooper-Balis, Keren Bergman, Kurt Ferreira, David Bunde, and K. Scott Hemmert. Improvements to the structural simulation toolkit. In Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques (SIMUTools), pages 190--195, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Curtis L Janssen, Helgi Adalsteinsson, Scott Cranford, Joseph P Kenny, Ali Pinar, David A Evensky, and Jackson Mayo. A simulator for large-scale parallel computer architectures. Technology Integration Advancements in Distributed Systems and Computing, 179:57--73, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xing Wu and Frank Mueller. Scalaextrap: Trace-based communication extrapolation for spmd programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 113--122, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Amir Bahmani and Frank Mueller. Scalable communication event tracing via clustering. Journal of Parallel and Distributed Computing, 109:230--244, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, and Weimin Zheng. Fact: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 27:1--27:12, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dan Quinlan. Rose: Compiler support for object-oriented frameworks. Parallel Processing Letters, 10(02n03):215--226, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Dave, H. Bae, S. J. Min, S. Lee, R. Eigenmann, and S. Midkiff. Cetus: A source-to-source compiler infrastructure for multicores. Computer, 42(12):36--42, Dec 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Seyong Lee, Jeremy S. Meredith, and Jeffrey S. Vetter. Compass: A framework for automated performance modeling and prediction. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pages 405--414, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cy Chan, Didem Unat, Michael Lijewski, Weiqun Zhang, John Bell, and John Shalf. Software design space exploration for exascale combustion co-design. In Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer, editors, Supercomputing, pages 196--212, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Bhattacharyya and T. Hoefler. Pemogen: Automatic adaptive performance modeling during program runtime. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 393--404, Aug 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Chennupati, N. Santhi, S. Eidenbenz, and S. Thulasidasan. An analytical memory hierarchy model for performance prediction. In 2017 Winter Simulation Conference (WSC), pages 908--919. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Openacc. openacc: Directives for accelerators., January 2018.Google ScholarGoogle Scholar
  19. Mohammad Obaida and Jason Liu. Simulation of hpc job scheduling and large-scale parallel workloads. In Proceedings of the 2017 Winter Simulation Conference (WSC 2017), 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, and Laxmikant V. Kale. Evaluating hpc networks via simulation of parallel workloads. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, pages 14:1--14:12, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kyle L Spafford and Jeffrey S Vetter. Aspen: a domain specific language for performance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nathan R Tallent and Adolfy Hoisie. Palm: easing the burden of analytical performance modeling. In Proceedings of the 28th ACM international conference on Supercomputing, pages 221--230, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christopher D. Carothers, Jeremy S. Meredith, Mark P. Blanco, Jeffrey S. Vetter, Misbah Mubarak, Justin LaPre, and Shirley Moore. Durango: Scalable synthetic workload generation for extreme-scale application performance modeling and simulation. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM-PADS '17, pages 97--108, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Mubarak, C. D. Carothers, R. B. Ross, and P. Carns. Enabling parallel simulation of large-scale hpc network systems. IEEE Transactions on Parallel and Distributed Systems, 28(1):87--100, Jan 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Gengbin Zheng, Gunavardhan Kakulapati, and Laxmikant V Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 78, 2004.Google ScholarGoogle Scholar
  26. Christopher D Carothers, David Bauer, and Shawn Pearce. ROSS: A high-performance, low-memory, modular Time Warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  27. Gopinath Chennupati, Nanadakishore Santhi, Stephen Eidenbenz, Robert Joseph Zerr, Massimiliano Rosa, Richard James Zamora, Eun Jung Park, Balasubramanya T. Nadiga, Jason Liu, Kishwar Ahmed, and Mohammad Abu Obaida. Performance prediction toolkit. https://github.com/lanl/PPT, 2017.Google ScholarGoogle Scholar
  28. Kishwar Ahmed, Mohammad Obaida, Jason Liu, Stephan Eidenbenz, Nandakishore Santhi, and Guillaume Chapuis. An integrated interconnection network model for large-scale performance prediction. In Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS), pages 177--187, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kishwar Ahmed, Jason Liu, Stephan Eidenbenz, and Joe Zerr. Scalable interconnection network models for rapid performance prediction of hpc applications. In Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications (HPCC), pages 1069--1078, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. G. Chennupati, N. Santhi, R. Bird, S. Thulasidasan, A. H. A. Badawy, S. Misra, and S. Eidenbenz. A scalable analytical memory model for cpu performance prediction. In Stephen Jarvis, Steven Wright, and Simon Hammond, editors, Proceedings of the 8th International Workshop on High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, PMBS@SC, pages 114--135. Springer, 2017.Google ScholarGoogle Scholar
  31. G. Chapuis, S. Eidenbenz, and N. Santhi. Gpu performance prediction through parallel discrete event simulation and common sense. In Proceedings of the 9th EAI International Conference on Performance Evaluation Methodologies and Tools, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nandakishore Santhi, Stephan Eidenbenz, and Jason Liu. The simian concept: Parallel discrete event simulation with interpreted languages and just-in-time compilation. In L. Yilmaz, W. K V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, editors, Proceedings of the 2015 Winter Simulation Conference, pages 3013--3024, Piscataway, New Jersey, 2015. Institute of Electrical and Electronics Engineers, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Seyong Lee and Jeffrey S. Vetter. Openarc: Extensible openacc compiler framework for directive-based accelerator programming study. In Proceedings of the First Workshop on Accelerator Programming Using Directives, WACCPD '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vikram S Adve, Rajive Bagrodia, Ewa Deelman, and Rizos Sakellariou. Compiler-optimized simulation of large-scale applications on high performance architectures. Journal of Parallel and Distributed Computing, 62(3):393 -- 426, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Scott Pakin and Patrick McCormick. Hardware-independent application characterization. In International Symposium on Workload Characterization (IISWC), pages 111--112, Portland, Oregon, USA, 2013. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  36. E. Berg and E. Hagersten. StatCache: a probabilistic approach to efficient and accurate data locality analysis. In IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004, pages 20--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ptx: Nvidia parallel thread execution, December 2017.Google ScholarGoogle Scholar
  38. B. Kalla, N. Santhi, A. H. A. Badawy, G. Chennupati, and S. Eidenbenz. A probabilistic monte carlo framework for branch prediction. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGSIM-PADS '18: Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
          May 2018
          224 pages
          ISBN:9781450350921
          DOI:10.1145/3200921

          Copyright © 2018 ACM

          © 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 May 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGSIM-PADS '18 Paper Acceptance Rate15of46submissions,33%Overall Acceptance Rate398of779submissions,51%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader