Performance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System

Anderson, Matthew; Brodowicz, Maciej; Kulkarni, Abhishek; Sterling, Thomas

doi:10.1007/978-3-319-10214-6_7

Matthew Anderson¹⁶,
Maciej Brodowicz¹⁶,
Abhishek Kulkarni¹⁶ &
…
Thomas Sterling¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8551))

Included in the following conference series:

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

842 Accesses

Abstract

Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific computing. One possible solution for improving efficiency and scalability in applications on this class of machines is the use of a many-tasking runtime system employing many lightweight, concurrent threads. Yet a priori estimation of the potential performance and scalability impact of such runtime systems on existing applications developed around the bulk synchronous parallel (BSP) model is not well understood. In this work, we present a case study of a BSP particle-in-cell benchmark code which has been ported to a many-tasking runtime system. The 3-D Gyrokinetic Toroidal code (GTC) is examined in its original MPI form and compared with a port to the High Performance ParalleX 3 (HPX-3) runtime system. Phase overlap, oversubscription behavior, and work rebalancing in the implementation are explored. Results for GTC using the SST/macro simulator complement the implementation results. Finally, an analytic performance model for GTC is presented in order to guide future implementation efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code Using Directives

Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures

Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software

References

http://cilkplus.org/ (2012)
Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen, J.-W., Ryu, S., Steele Jr., G.L., Tobin-Hochstadt, S.: The Fortress language specification, version 1.0 (March 2008)
Google Scholar
Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of mpi collective operations. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDP 2005) - Workshop 15 (2005)
Google Scholar
Antypas, K., Shalf, J., Wasserman, H.: Nersc-6 workload analysis and benchmark selection process. Technical Report LBNL 1014E, National Energy Research Scientific Computing Center Division Ernest Orlando Lawrence Berkeley National Laboratory (August 2008)
Google Scholar
Appeltaue, M., Hirschfeld, R., Haupt, M., Lincke, J., Perscheid, M.: A comparison of context-oriented programming languages. In: International Workshop on Context-Oriented Programming, COP 2009, pp. 6:1–6:6. ACM, New York (2009)
Google Scholar
Cappello, F., Etiemble, D.: Mpi versus mpi+openmp on the ibm sp for the nas benchmarks. In: ACM/IEEE 2000 Conference on Supercomputing, p. 12 (2000)
Google Scholar
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Article Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not. 40, 519–538 (2005)
Article Google Scholar
Dekate, C., Anderson, M., Brodowicz, M., Kaiser, H., Adelstein-Lelbach, B., Sterling, T.: Improving the scalability of parallel N-body applications with an event-driven constraint-based execution model. International Journal of High Performance Computing Applications 26(3), 319–332 (2012)
Article Google Scholar
Dinan, J., Balaji, E., Lusk, E., Sadayappan, P., Thakur, R.: Hybrid parallel programming with mpi and unified parallel c. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, CF 2010, pp. 177–186. ACM, New York (2010)
Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE Computer Society, Washington, DC (2009)
Google Scholar
El-Ghazawi, T., Cantonnet, F., Yao, Y.: Evaluations of UPC on the Cray X1. In: CUG 2005 Proceedings, New York, NY, USA, p. 10 (2005)
Google Scholar
Ethier, S., Tang, W.M., Lin, Z.: Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. Journal of Physics: Conference Series 16(1), 1 (2005)
Google Scholar
Gao, G. Sterling, T., Stevens, R. Hereld, M., Zhu, W.: Parallex: A study of a new parallel computation model. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–6 (2007)
Google Scholar
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In: Proc. of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2013)
Google Scholar
Gilmanov, T., Anderson, M., Brodowicz, M., Sterling, T.: Application characteristics of many-tasking execution models. In: Proc. of the 2013 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (2013)
Google Scholar
Hendry, G.: Decreasing Network Power with On-Off Links Informed by Scientific Applications. In: The Ninth Workshop on High-Performance, Power Aware Computing (May 2013)
Google Scholar
Hendry, G., Rodrigues, A.: Simulator for exascale co-design, http://sst.sandia.gov/publications.html
Hendry, G., Rodrigues, A.: Sst: A simulator for exascale co-design. In: Proc. of the ASCR/ASC Exascale Research Conference (2012)
Google Scholar
Hewitt, C., Baker, H.G.: Actors and continuous functionals. Technical report, Cambridge, MA, USA (1978)
Google Scholar
Hockney, R.W.: The communication challenge for mpp: Intel paragon and meiko cs-2. Parallel Comput. 20(3), 389–398 (1994)
Article Google Scholar
Hoefler, T., Gropp, W., Snir, M., Kramer, W.: Performance Modeling for Systematic Performance Tuning. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), SotP Session (November 2011)
Google Scholar
Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim - simulating large-scale applications in the LogGOPS model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 597–604. ACM (June 2010)
Google Scholar
HPC University and the Ohio Supercomputer Center. Report on high performance computing training and education survey, http://www.teragridforum.org/mediawiki/images/5/5d/HPCSurveyResults.FINAL.pdf
Iancu, C., Hofmeyr, S., Blagojevic, F., Zheng, Y.: Oversubscription on multicore processors. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–11 (April 2010)
Google Scholar
Kaiser, H., Brodowicz, M., Sterling, T.: ParalleX an advanced parallel execution model for scaling-impaired applications. In: International Conference on Parallel Processing Workshops, ICPPW 2009, pp. 394–401 (September 2009)
Google Scholar
Kale, L.V., Krishnan, S.: Charm++: Parallel Programming with Message-Driven Objects. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming Using C++, pp. 175–213. MIT Press (1996)
Google Scholar
Karlin, I., Bhatele, A., Keasler, J., Chamberlain, B.L., Cohen, J., DeVito, Z., Haque, R., Laney, D., Luke, E., Wang, F., Richards, D. Schulz, M., Still, C.H.: Exploring traditional and emerging parallel programming models using a proxy application. In: Proc. of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2013)
Google Scholar
Koniges, A., Preissl, R., Kim, J., Eder, D., Fisher, A., Masters, N., Mlaker, V., Ethier, S., Wang, W., Head-Gordon, M., Wichmann, N.: Application Acceleration on Current and Future Cray Platforms. In: CUG 2010, the Cray User Group Meeting (May 2010)
Google Scholar
Madduri, K., Ibrahim, K.Z., Williams, S., Im, E.-J., Ethier, S., Shalf, J., Oliker, L.: Gyrokinetic toroidal simulations on leading multi- and manycore hpc systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 23:1–23:12. ACM, New York (2011)
Google Scholar
Mathis, M.M., Kerbyson, D.J., Hoisie, A.: A performance model of non-deterministic particle transport on large-scale systems. Future Gener. Comput. Syst. 22(3), 324–335 (2006)
Article Google Scholar
McCool, M.D., Robison, A.D., Reinders, J.: Structured parallel programming patterns for efficient computation (2012)
Google Scholar
Olivier, S., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. International Journal of Parallel Programming 38(5–6), 341–360 (2010)
Article MATH Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, 1st edn. O’Reilly Media (July 2007)
Google Scholar
Robert, J., Halstead, H.: Multilisp: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7(4), 501–538 (1985)
Article MATH Google Scholar
Stitt, T., Robinson, T.: A survey on training and education needs for petascale computing, http://www.prace-project.eu/IMG/pdf/D3-3-1_document_final.pdf
Tskhakaya, D.: The particle-in-cell method. In: Fehske, H., Schneider, R., Weie, A. (eds.) Computational Many-Particle Physics. Lecture Notes in Physics, vol. 739, pp. 161–189. Springer, Heidelberg (2008)
Google Scholar
Wheeler, K., Murphy, R., Thain, D.: Qthreads: An API for Programming with Millions of Lightweight Threads. In: International Parallel and Distributed Processing Symposium. IEEE Press (2008)
Google Scholar
Wu, X., Taylor, V.: Performance modeling of hybrid mpi/openmp scientific applications on large-scale multicore cluster systems. In: 2011 IEEE 14th International Conference on Computational Science and Engineering (CSE), pp. 181–190 (2011)
Google Scholar
Yang, C., Murthy, K., Mellor-Crummey, J.: Managing asynchronous operations in coarray fortran 2.0. In: Proc. of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Computing, Center for Research in Extreme Scale Technologies, Indiana University, Bloomington, Indiana
Matthew Anderson, Maciej Brodowicz, Abhishek Kulkarni & Thomas Sterling

Authors

Matthew Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Brodowicz
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Sterling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew Anderson .

Editor information

Editors and Affiliations

University of Warwick Coventry, West Midlands, United Kingdom
Stephen A. Jarvis
University of Warwick Coventry, West Midlands, United Kingdom
Steven A. Wright
Sandia National Laboratories CSRI, Albuquerque, New Mexico, USA
Simon D. Hammond

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anderson, M., Brodowicz, M., Kulkarni, A., Sterling, T. (2014). Performance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-10214-6_7
Published: 01 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10213-9
Online ISBN: 978-3-319-10214-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics