Skip to main content

Analyzing Performance of Selected NESAP Applications on the Cori HPC System

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Abstract

NERSC has partnered with over 20 representative application developer teams to evaluate and optimize their workloads on the Intel® Xeon Phi™Knights Landing processor. In this paper, we present a summary of this two year effort and will present the lessons we learned in that process. We analyze the overall performance improvements of these codes quantifying impacts of both Xeon Phi™architectural features as well as code optimization on application performance. We show that the architectural advantage, i.e. the average speedup of optimized code on KNL vs. optimized code on Haswell is about 1.1\(\times \). The average speedup obtained through application optimization, i.e. comparing optimized vs. original codes on KNL, is about 5\(\times \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    This includes the subsets F, CD, ER, PF but not VL, BW, DQ, IFMA, VBMI.

  2. 2.

    numactl -p 1 mimics the behavior of numactl -m 1 but it is safer as it will not abort execution if there is no remaining free space in MCDRAM.

References

  1. BerkeleyGW Website. http://www.berkeleygw.org

  2. CESM Web Site. http://www.cesm.ucar.edu

  3. DOE Public Access Plan. https://energy.gov/downloads/doe-public-access-plan

  4. GROMACS Web Site. http://www.gromacs.org

  5. HMMER Web Site. http://hmmer.org/

  6. MILC Website. http://physics.indiana.edu/~sg/milc.html

  7. NERSC and DOE Requirements Reviews Series. http://www.nersc.gov/science/hpc-requirements-reviews/

  8. NERSC NESAP applications. http://www.nersc.gov/users/computational-systems/cori/nesap/nesap-projects/

  9. NERSC NESAP case studies. http://www.nersc.gov/users/computational-systems/cori/application-porting-and-performance/application-case-studies/

  10. NERSC Web Site. http://www.nersc.gov

  11. QBox Web Site. http://qboxcode.org

  12. Warp Web Site. http://warp.lbl.gov

  13. XGC1 Web Site. http://epsi.pppl.gov/computing/xgc-1

  14. Almgren, A.S., Bell, J.B., Lijewski, M.J., Lukić, Z., Van Andel, E.: Nyx: a massively parallel AMR code for computational cosmology. Astrophys. J. 765, 39 (2013)

    Article  Google Scholar 

  15. Barnes, T., Cook, B., Deslippe, J., Doerfler, D., Friesen, B., He, Y.H., Kurth, T., Koskela, T., Lobet, M., Malas, T., Oliker, L., Ovsyannikov, A., Sarje, A., Vay, J.L., Vincenti, H., Williams, S., Carrier, P., Wichmann, N., Wagner, M., Kent, P., Kerr, C., Dennis, J.: Evaluating and optimizing the NERSC workload on knights landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2016, pp. 43–53. IEEE Press (2016)

    Google Scholar 

  16. Barnes, T.A., Kurth, T., Carrier, P., Wichmann, N., Prendergast, D., Kent, P.R., Deslippe, J.: Improved treatment of exact exchange in quantum espresso. Comput. Phys. Commun. 214, 52–58 (2017)

    Article  Google Scholar 

  17. Bauer, B., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC Lattice QCD application su3_rmd. In: Proceedings of CCGRID 2012: IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (2012)

    Google Scholar 

  18. Binder, S., Calci, A., Epelbaum, E., Furnstahl, R.J., Golak, J., Hebeler, K., Kamada, H., Krebs, H., Langhammer, J., Liebig, S., Maris, P., Meißner, U.G., Minossi, D., Nogga, A., Potter, H., Roth, R., Skinińki, R., Topolnicki, K., Vary, J.P., Witała, H.: Few-nucleon systems with state-of-the-art chiral nucleon-nucleon forces. Phys. Rev. C 93(4), 044002 (2016)

    Article  Google Scholar 

  19. Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: Berkeleygw: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012). http://www.sciencedirect.com/science/article/pii/S0010465511003912

    Article  Google Scholar 

  20. Doerfler, D., Austin, B., Cook, B., Deslippe, J., Kandalla, K., Mendygral, P.: Evaluating the networking characteristics of the cray XC-40 intel knights landing based cori supercomputer at NERSC. In: Cray User Group Meeting (CUG), May 2017

    Google Scholar 

  21. Doerfler, D., Deslippe, J., Williams, S., Oliker, L., Cook, B., Kurth, T., Lobet, M., Malas, T., Vay, J.-L., Vincenti, H.: Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 339–353. Springer, Cham (2016). doi:10.1007/978-3-319-46079-6_24

    Chapter  Google Scholar 

  22. Eddy, S.R.: Accelerated profile hmm searches. PLOS Comput. Biol. 7(10), 1–16 (2011). https://doi.org/10.1371/journal.pcbi.1002195

    Article  MathSciNet  Google Scholar 

  23. Edwards, R.G., Joo, B.: The Chroma software system for lattice QCD. Nucl. Phys. Proc. Suppl. 140, 832 (2005)

    Article  Google Scholar 

  24. Friesen, B., Almgren, A., Lukić, Z., Weber, G., Morozov, D., Beckner, V., Day, M.: In situ and in-transit analysis of cosmological simulations. Comput. Astrophys. Cosmol. 3, 4 (2016)

    Article  Google Scholar 

  25. Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Corso, A.D., de Gironcoli, S., Fabris, S., Fratesi, G., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: Quantum espresso: a modular and open-source software project for quantum simulations of materials. J. Phys. Condens. Matter 21(39), 395502 (2009). http://stacks.iop.org/0953-8984/21/i=39/a=395502

    Article  Google Scholar 

  26. Gygi, F.: Architecture of Qbox: a scalable first-principles molecular dynamics code. IBM J. Res. Dev. 52(1/2), 137–144 (2008). http://dl.acm.org/citation.cfm?id=1375990.1376003

    Article  Google Scholar 

  27. Hager, R., Yoon, E., Ku, S., D’Azevedo, E., Worley, P., Chang, C.: A fully non-linear multi-species fokkerplancklandau collision operator for simulation of fusion plasma. J. Comput. Phys. 315, 644–660 (2016). http://www.sciencedirect.com/science/article/pii/S0021999116300298

    Article  MathSciNet  MATH  Google Scholar 

  28. Hurrell, J., Holland, M., Gent, P., Ghan, S., Kay, J., Kushner, P., Lamarque, J.F., Large, W., Lawrence, D., Lindsay, K., Lipscomb, W., Long, M., Mahowald, N., Marsh, D., Neale, R., Rasch, P., Vavrus, S., Vertenstein, M., Bader, D., Collins, W., Hack, J., Kiehl, J., Marshall, S.: The community earth system model: a framework for collaborative research. Bull. Am. Meteorol. Soc. 94, 1339–1360 (2013)

    Article  Google Scholar 

  29. Joó, B.: qphix package web page. http://jeffersonlab.github.io/qphix

  30. Joó, B.: qphix-codegen package web page. http://jeffersonlab.github.io/qphix-codegen

  31. Kresse, G., Furthmueller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996). http://www.sciencedirect.com/science/article/pii/0927025696000080

    Article  Google Scholar 

  32. Ku, S., Chang, C., Diamond, P.: Full-f gyrokinetic particle simulation of centrally heated global itg turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry. Nucl. Fusion 49(11), 115021 (2009)

    Article  Google Scholar 

  33. Lukić, Z., Stark, C.W., Nugent, P., White, M., Meiksin, A.A., Almgren, A.: The Lyman \(\alpha \) forest in optically thin hydrodynamical simulations. Mon. Not. R. Astron. Soc. 446, 3697–3724 (2015)

    Article  Google Scholar 

  34. Maris, P., Caprio, M.A., Vary, J.P.: Emergence of rotational bands in ab initio no-core configuration interaction calculations of the Be isotopes. Phys. Rev. C 91(1), 014310 (2015)

    Article  Google Scholar 

  35. Maris, P., Vary, J.P., Navratil, P., Ormand, W.E., Nam, H., Dean, D.J.: Origin of the anomalous long lifetime of \(^{14}\)C. Phys. Rev. Lett. 106(20), 202502 (2011)

    Article  Google Scholar 

  36. Maris, P., Vary, J.P., Gandolfi, S., Carlson, J., Pieper, S.C.: Properties of trapped neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C 87(5), 054318 (2013)

    Article  Google Scholar 

  37. Petersen, M.R., Jacobsen, D.W., Ringler, T.D., Hecht, M.W., Maltrud, M.E.: Evaluation of the arbitrary lagrangian-eulerian vertical coordinate method in the MPAS-ocean model. Ocean Modell. 86, 93–113 (2015). http://www.sciencedirect.com/science/article/pii/S1463500314001796

    Article  Google Scholar 

  38. Petrov, P.V., Newman, G.A.: Three-dimensional inverse modelling of damped elastic wave propagation in the fourier domain. Geophys. J. Int. 198(3), 1599–1617 (2014)

    Article  Google Scholar 

  39. Petrov, P.V., Newman, G.A.: 3D finite-difference modeling of elastic wave propagation in the laplace-fourier domain. GEOPHYSICS 77(4), T137–T155 (2012). http://dx.doi.org/10.1190/geo2011-0238.1

    Article  Google Scholar 

  40. Pronk, S., Pll, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., Shirts, M.R., Smith, J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845 (2013). http://dx.doi.org/10.1093/bioinformatics/btt055

    Article  Google Scholar 

  41. Ringler, T., Petersen, M., Higdon, R.L., Jacobsen, D., Jones, P.W., Maltrud, M.: A multi-resolution approach to global ocean modeling. Ocean Model. 69, 211–232 (2013). http://www.sciencedirect.com/science/article/pii/S1463500313000760

    Article  Google Scholar 

  42. Straalen, B.V., Trebotich, D., Ovsyannikov, A., Graves, D.T.: Scalable structured adaptive mesh refinement with complex geometry. In: Exascale Scientific Applications: Programming Approaches for Scalability Performance and Portability. CRC Press (in press)

    Google Scholar 

  43. Trebotich, D., Adams, M.F., Molins, S., Steefel, C.I., Chaopeng, S.: High-resolution simulation of pore-scale reactive transport processes associated with carbon sequestration. Comput. Sci. Eng. 16(6), 22–31 (2014)

    Article  Google Scholar 

  44. Trebotich, D., Graves, D.: An adaptive finite volume method for the incompressible Navier-Stokes equations in complex geometries. Commun. Appl. Math. Comput. Sci. 10(1), 43–82 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  45. Vincenti, H., Lobet, M., Lehe, R., Sasanka, R., Vay, J.L.: An efficient and portable SIMD algorithm for charge/current deposition in particle-in-cell codes. Comput. Phys. Commun. 210, 145–154 (2017). http://www.sciencedirect.com/science/article/pii/S0010465516302764

    Article  MATH  Google Scholar 

  46. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). http://doi.acm.org/10.1145/1498765.1498785

    Article  Google Scholar 

  47. Williams, S.W.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-164.html

Download references

Acknowledgement

Research used resources of NERSC, a DOE Office of Science User Facility supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231. This article has been authored at Lawrence Berkeley National Lab under Contract No. DE-AC02-05CH11231 and UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the United States Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan [3].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thorsten Kurth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kurth, T. et al. (2017). Analyzing Performance of Selected NESAP Applications on the Cori HPC System. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67630-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67629-6

  • Online ISBN: 978-3-319-67630-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics