Skip to main content

Performance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System

  • Conference paper
  • First Online:
High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation (PMBS 2013)

Abstract

Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific computing. One possible solution for improving efficiency and scalability in applications on this class of machines is the use of a many-tasking runtime system employing many lightweight, concurrent threads. Yet a priori estimation of the potential performance and scalability impact of such runtime systems on existing applications developed around the bulk synchronous parallel (BSP) model is not well understood. In this work, we present a case study of a BSP particle-in-cell benchmark code which has been ported to a many-tasking runtime system. The 3-D Gyrokinetic Toroidal code (GTC) is examined in its original MPI form and compared with a port to the High Performance ParalleX 3 (HPX-3) runtime system. Phase overlap, oversubscription behavior, and work rebalancing in the implementation are explored. Results for GTC using the SST/macro simulator complement the implementation results. Finally, an analytic performance model for GTC is presented in order to guide future implementation efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://cilkplus.org/ (2012)

  2. Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen, J.-W., Ryu, S., Steele Jr., G.L., Tobin-Hochstadt, S.: The Fortress language specification, version 1.0 (March 2008)

    Google Scholar 

  3. Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of mpi collective operations. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDP 2005) - Workshop 15 (2005)

    Google Scholar 

  4. Antypas, K., Shalf, J., Wasserman, H.: Nersc-6 workload analysis and benchmark selection process. Technical Report LBNL 1014E, National Energy Research Scientific Computing Center Division Ernest Orlando Lawrence Berkeley National Laboratory (August 2008)

    Google Scholar 

  5. Appeltaue, M., Hirschfeld, R., Haupt, M., Lincke, J., Perscheid, M.: A comparison of context-oriented programming languages. In: International Workshop on Context-Oriented Programming, COP 2009, pp. 6:1–6:6. ACM, New York (2009)

    Google Scholar 

  6. Cappello, F., Etiemble, D.: Mpi versus mpi+openmp on the ibm sp for the nas benchmarks. In: ACM/IEEE 2000 Conference on Supercomputing, p. 12 (2000)

    Google Scholar 

  7. Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)

    Article  Google Scholar 

  8. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not. 40, 519–538 (2005)

    Article  Google Scholar 

  9. Dekate, C., Anderson, M., Brodowicz, M., Kaiser, H., Adelstein-Lelbach, B., Sterling, T.: Improving the scalability of parallel N-body applications with an event-driven constraint-based execution model. International Journal of High Performance Computing Applications 26(3), 319–332 (2012)

    Article  Google Scholar 

  10. Dinan, J., Balaji, E., Lusk, E., Sadayappan, P., Thakur, R.: Hybrid parallel programming with mpi and unified parallel c. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, CF 2010, pp. 177–186. ACM, New York (2010)

    Google Scholar 

  11. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  12. El-Ghazawi, T., Cantonnet, F., Yao, Y.: Evaluations of UPC on the Cray X1. In: CUG 2005 Proceedings, New York, NY, USA, p. 10 (2005)

    Google Scholar 

  13. Ethier, S., Tang, W.M., Lin, Z.: Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. Journal of Physics: Conference Series 16(1), 1 (2005)

    Google Scholar 

  14. Gao, G. Sterling, T., Stevens, R. Hereld, M., Zhu, W.: Parallex: A study of a new parallel computation model. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–6 (2007)

    Google Scholar 

  15. Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In: Proc. of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2013)

    Google Scholar 

  16. Gilmanov, T., Anderson, M., Brodowicz, M., Sterling, T.: Application characteristics of many-tasking execution models. In: Proc. of the 2013 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (2013)

    Google Scholar 

  17. Hendry, G.: Decreasing Network Power with On-Off Links Informed by Scientific Applications. In: The Ninth Workshop on High-Performance, Power Aware Computing (May 2013)

    Google Scholar 

  18. Hendry, G., Rodrigues, A.: Simulator for exascale co-design, http://sst.sandia.gov/publications.html

  19. Hendry, G., Rodrigues, A.: Sst: A simulator for exascale co-design. In: Proc. of the ASCR/ASC Exascale Research Conference (2012)

    Google Scholar 

  20. Hewitt, C., Baker, H.G.: Actors and continuous functionals. Technical report, Cambridge, MA, USA (1978)

    Google Scholar 

  21. Hockney, R.W.: The communication challenge for mpp: Intel paragon and meiko cs-2. Parallel Comput. 20(3), 389–398 (1994)

    Article  Google Scholar 

  22. Hoefler, T., Gropp, W., Snir, M., Kramer, W.: Performance Modeling for Systematic Performance Tuning. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), SotP Session (November 2011)

    Google Scholar 

  23. Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim - simulating large-scale applications in the LogGOPS model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 597–604. ACM (June 2010)

    Google Scholar 

  24. HPC University and the Ohio Supercomputer Center. Report on high performance computing training and education survey, http://www.teragridforum.org/mediawiki/images/5/5d/HPCSurveyResults.FINAL.pdf

  25. Iancu, C., Hofmeyr, S., Blagojevic, F., Zheng, Y.: Oversubscription on multicore processors. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–11 (April 2010)

    Google Scholar 

  26. Kaiser, H., Brodowicz, M., Sterling, T.: ParalleX an advanced parallel execution model for scaling-impaired applications. In: International Conference on Parallel Processing Workshops, ICPPW 2009, pp. 394–401 (September 2009)

    Google Scholar 

  27. Kale, L.V., Krishnan, S.: Charm++: Parallel Programming with Message-Driven Objects. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming Using C++, pp. 175–213. MIT Press (1996)

    Google Scholar 

  28. Karlin, I., Bhatele, A., Keasler, J., Chamberlain, B.L., Cohen, J., DeVito, Z., Haque, R., Laney, D., Luke, E., Wang, F., Richards, D. Schulz, M., Still, C.H.: Exploring traditional and emerging parallel programming models using a proxy application. In: Proc. of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2013)

    Google Scholar 

  29. Koniges, A., Preissl, R., Kim, J., Eder, D., Fisher, A., Masters, N., Mlaker, V., Ethier, S., Wang, W., Head-Gordon, M., Wichmann, N.: Application Acceleration on Current and Future Cray Platforms. In: CUG 2010, the Cray User Group Meeting (May 2010)

    Google Scholar 

  30. Madduri, K., Ibrahim, K.Z., Williams, S., Im, E.-J., Ethier, S., Shalf, J., Oliker, L.: Gyrokinetic toroidal simulations on leading multi- and manycore hpc systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 23:1–23:12. ACM, New York (2011)

    Google Scholar 

  31. Mathis, M.M., Kerbyson, D.J., Hoisie, A.: A performance model of non-deterministic particle transport on large-scale systems. Future Gener. Comput. Syst. 22(3), 324–335 (2006)

    Article  Google Scholar 

  32. McCool, M.D., Robison, A.D., Reinders, J.: Structured parallel programming patterns for efficient computation (2012)

    Google Scholar 

  33. Olivier, S., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. International Journal of Parallel Programming 38(5–6), 341–360 (2010)

    Article  MATH  Google Scholar 

  34. Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, 1st edn. O’Reilly Media (July 2007)

    Google Scholar 

  35. Robert, J., Halstead, H.: Multilisp: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7(4), 501–538 (1985)

    Article  MATH  Google Scholar 

  36. Stitt, T., Robinson, T.: A survey on training and education needs for petascale computing, http://www.prace-project.eu/IMG/pdf/D3-3-1_document_final.pdf

  37. Tskhakaya, D.: The particle-in-cell method. In: Fehske, H., Schneider, R., Weie, A. (eds.) Computational Many-Particle Physics. Lecture Notes in Physics, vol. 739, pp. 161–189. Springer, Heidelberg (2008)

    Google Scholar 

  38. Wheeler, K., Murphy, R., Thain, D.: Qthreads: An API for Programming with Millions of Lightweight Threads. In: International Parallel and Distributed Processing Symposium. IEEE Press (2008)

    Google Scholar 

  39. Wu, X., Taylor, V.: Performance modeling of hybrid mpi/openmp scientific applications on large-scale multicore cluster systems. In: 2011 IEEE 14th International Conference on Computational Science and Engineering (CSE), pp. 181–190 (2011)

    Google Scholar 

  40. Yang, C., Murthy, K., Mellor-Crummey, J.: Managing asynchronous operations in coarray fortran 2.0. In: Proc. of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew Anderson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Anderson, M., Brodowicz, M., Kulkarni, A., Sterling, T. (2014). Performance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10214-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10213-9

  • Online ISBN: 978-3-319-10214-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics