Abstract
An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significant changes in the hardware and software stack, including a deeper memory hierarchy, SIMD instructions and a multi-core aware MPI library. In this paper, we evaluate impact of a subset of these key changes on large-scale scientific applications. We will provide insights into application tuning and optimization process and report on how different strategies yield varying rates of successes and failures across different application domains. For instance, we demonstrate that the vectorization instructions (SSE) provide a performance boost of as much as 50% on fusion and combustion applications. Moreover, we reveal how the resource contentions could limit the achievable performance and provide insights into how application could exploit Petascale XT5 system’s hierarchical parallelism.
Chapter PDF
Similar content being viewed by others
Keywords
- Direct Numerical Simulation
- International Thermonuclear Experimental Reactor5
- Software Stack
- LINPACK Benchmark
- Embarrassingly Parallel
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1), 46–55 (1998)
Snir, M., Gropp, W.D., et al. (eds.): MPI – the complete reference (2-volume set), 2nd edn. MIT Press, Cambridge (1998)
Gara, A., et al.: Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development, 49(2-3) (2005)
Vetter, J.S., et al.: Early Evaluation of IBM BlueGene/P. In: Proceedings of Supercomputing (2008)
Camp, W.J., Tomkins, J.L.: Thor’s hammer: The first version of the Red Storm MPP architecture. In: Proceedings of Conference on High Performance Networking and Computing, Baltimore, MD (November 2002)
Vetter, J.S., Alam, S.R., et al.: Early Evaluation of the Cray XT3. In: Proc. IEEE International Parallel and Distributed Processing Symposium, IPDPS (2006)
Alam, S.R., Barrett, R.F., et al.: Cray XT4: An Early Evaluation for Petascale Scientific Simulation. In: Proceedings of the IEEE/ACM Conference on Supercomputing SC 2007 (2007)
Alam, S.R., Barrett, R.F., et al.: The Cray XT4 Quad-core: A First Look. In: Proceedings of the 50th Cray User Group (2008)
Kelly, S., Brightwell, R.: Software architecture of the lightweight kernel, catamount. In: Proceedings of the 47th Cray User Group (2005)
Luszczek, P., Dongarra, J., et al.: Introduction to the HPC Challenge Benchmark Suite (March 2005)
High Performance Computing Challenge Benchmark Suite Website, http://icl.cs.utk.edu/hpcc/
Barrett, R.F., Chan, T., et al.: A complex-variables version of high performance computing LINPACK benchmark, HPL (2008) (in preparation)
Jaeger, E.F., Berry, L.A., et al.: Self-consistent full-wave and Fokker-Planck calculations for ion cyclotron heating in non-Maxwellian plasmas. Physics of Plasmas (May 13, 2006)
Jaeger, E.F., Berry, L.A., et al.: Simulation of high power ICRF wave heating in the ITER burning plasma. In: Jaeger, E.F., Berry, L.A. (eds.) Proceedings of the 49th Annual Meeting of the Division of Plasma Physics of the American Physical Society, vol. 52. Bulletin of the American Physical Society (2007)
Dongarra, J.J., DuCroz, J., et al.: A set of level 3 basic linear algebra subprograms. ACM Trans.on Math. Soft. 16, 1–17 (1990)
Langou, J., Luszczek, P., et al.: Tools and techniques for exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative re_nement for linear systems). In: Proc. ACM/IEEE Supercomputing (SC 2006) (2006)
Chen, J.H., Hawkes, E.R., et al.: Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics. Combustion and flame 145, 128–144 (2006)
Sankaran, R., Hawkes, E.R., et al.: Structure of a spatially developing turbulent lean methane-air Bunsen flame. Proceedings of the combustion institute 31, 1291–1298 (2007)
Hawkes, E.R., Sankaran, R., et al.: Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics. Proceedings of the combustion institute 31, 1633–1640 (2007)
Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage explicit Runge-Kutta schemes for the compressible Navier-Stokes equations. Applied numerical mathematics 35(3), 177–264 (2000)
The ScaLAPACK Project, http://www.netlib.org/scalapack/
Petitet, A., Whaley, R.C., Dongarra, J.J., Cleary, A.: HPL: A portable high-performance LINPACK benchmark for distributed-memory computers (January 2004), http://www.netlib.org/benchmark/hpl
Browne, S., Dongarra, J., et al.: A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters. In: Proceedings of Supercomputing (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alam, S.R., Barrett, R.F., Jagode, H., Kuehn, J.A., Poole, S.W., Sankaran, R. (2009). Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-03869-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)