research-article

Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program

Author:
Paul M. Bennett

U.S. DoD High Performance Computing Modernization Program, Vicksburg, MS

U.S. DoD High Performance Computing Modernization Program, Vicksburg, MS
View Profile

Authors Info & Claims

SC '11: State of the Practice ReportsNovember 2011Article No.: 3Pages 1–11https://doi.org/10.1145/2063348.2063352

Published:12 November 2011Publication History

SC '11: State of the Practice Reports

Pages 1–11

ABSTRACT

The U. S. Department of Defense High Performance Computing Modernization Program (HPCMP) has implemented sustained systems performance testing on high performance computing systems in use at DoD Supercomputing Resource Centers. The intent is to monitor performance improvements by updates to the operating system, compiler suites, and numerical and communications libraries, and to monitor penalties arising from security patches. In practice, each system's workload is simulated by appropriate choices of user application codes representative of the HPCMP computational technical areas. Past successes include surfacing an imminent failure of an OST in a Cray XT3, incomplete configuration of a scheduler update on an SGI Altix 4700, performance issues associated with a communications library update for a Linux Networx Advanced Technology Cluster, and intermittent resetting of Intel Nehalem cores to standard mode from turbo mode. This history demonstrates that SSP testing is critical to deliver the highest quality of service to the HPCMP users.

References

Bennett, P., Cable, S., Alter, R., Mahmoodi, M., and Oppe, T. 2006. Targeting CCM-, CEA-, and CSM-based computing to specific architectures based upon HPCMP systems assessment. In Proceedings of the HPCMP Users Group Conference 2006 (Denver, CO, June 26-29, 2006). UGC '06. IEEE Computer Society, Los Alamitos, CA, 360--366. Google ScholarDigital Library
Blackford, L., Cleary, A., Choi, J., D'Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. 1997. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA. Google ScholarDigital Library
Bleck, R. 2002. An oceanic general circulation model framed in hybrid isopycnic-Cartesian coordinates. Ocean Model., 4, 1 (Jan., 2002), 55--88.Google Scholar
Cable, S., Oppe, T., Ward, W., Jr., Campbell, R., Jr., Gordnier, R., Burnley, V., Grismer, M., and Buning, P. 2005. CFD-based HPCMP systems assessment using AERO, AVUS, and OVERFLOW-2. In Proceedings of the HPCMP Users Group Conference 2005 (Nashville, TN, June 27-30, 2005). UGC '05. IEEE Computer Society, Los Alamitos, CA, 349--355. Google ScholarDigital Library
Cliburn, J. 2005. ERDC MSRC installs most powerful supercomputer in DoD, ERDC MSRC Major Shared Resource Center RESOURCE (Fall 2005), 10.Google Scholar
Donagarra, J., and Luszczek, P. 2005. Introduction to the HPCChallenge Benchmark Suite. Technical Report ICL-UT-05-01. University of Tennessee, Knoxville.Google Scholar
Fatoohi, R. 2008. Performance evaluation of NSF application benchmarks on parallel systems. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing (Miami, FL, Apr. 14--18, 2008), IPDPS '08. IEEE Computer Society, Los Alamitos, CA, 1--8.Google ScholarCross Ref
Gordon, M. and Schmidt, M. 2005. Advances in electronic structure theory: GAMESS a decade later. In Theory and Applications of Computational Chemistry, the first forty years, eds. Dykstra, C., Frenking, G., Kim, K., and Scuseria, G. Elsevier, Amsterdam, The Netherlands.Google Scholar
Hertel E., Jr., Bell, R., Elrick, M., Farnsworth, A., Kerley, G., McGiaun, J., Pemey, S., Silling, S., Taylor, P., and Yarrington, L. 1992. CTH: A software family for multi-dimensional shock physics analysis. Technical Report SAND-92-2089C. Sandia National Laboratories, Albuquerque, New Mexico.Google Scholar
Hertel, E., Jr., et al. 1993. CTH: A Software Family for Multi-Dimensional Shock Physics Analysis. In Proceedings of the 19 ^th International Symposium on Shock Waves (Marseilles, France, July 26-30, 1993). Springer-Verlag, Berlin, Germany. Volume 1, 377--382.Google Scholar
Karypis, G. and Kumar, V. 1998. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 1 (Jan. 10, 1998), 96--129. Google ScholarDigital Library
Karypis, G. and Kumar, V. 1999. Parallel multilevel k-way partitioning scheme for irregular graphs. SIAM Rev. 41, 2 (1999), 278--300. Google ScholarDigital Library
Karypis, G., Schloegel, K., and Kumar, V. 1997. PARMETIS: Parallel graph partitioning scheme and matrix ordering library. Technical report, University of Minnesota, Department of Computer Science and Engineering.Google Scholar
Kramer, W., Shalf, J., and Strohmaier, E. 2005. The NERSC Sustained System Performance (SSP) Metric. Paper LBNL-58868. Lawrence Berkeley National Laboratory.Google Scholar
Leach, C., Oppe, T., Ward, W., Jr., and Campbell, R., Jr. 2005. CWO-based HPCMP systems assessment using HYCOM and WRF. In Proceedings of the HPCMP Users Group Conference 2005 (Nashville, TN, June 27-30, 2005), UGC '05. IEEE Computer Society, Los Alamitos, CA, 356--359. Google ScholarDigital Library
Schloegel, K., Karypis, G., and Kumar, V. 2000. A unified algorithm for load-balancing adaptive scientific simulations. In Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM) (Dallas, TX, Nov. 4-10, 2000), Supercomputing '00. IEEE Computer Society, Washington, D. C., Article 59. Google ScholarDigital Library
Schmidt, M., Baldridge, K., Boatz, J., Elbert, S., Gordon, M., Jensen, J., Koseki, S., Matsunaga, N., Nguyen, K., Su, S., Windus, T., Dupuis, M., and Montgomery, J. 1993. General Atomic and Molecular Electronic Structure System. J. Comput. Chem., 14, 11 (Nov. 1993), 1347--1363. Google ScholarDigital Library
Tomaro, R., Strang, W., and Sankar, L. 1997. An implicit algorithm for solving time-dependent flows on unstructured grids. Paper. At 35^th Aerospace Sciences Meeting and Exhibit (Reno, NV, Jan. 6-10, 1997), AIAA, Reston, VA, AIAA 97--0333.Google ScholarCross Ref
Tracy, F., Oppe, T., Ward, W., Jr., and Peterkin, R., Jr. 2003. A survey of the algorithms in the TI-03 application benchmarking suite with emphasis on linear system solvers. In Proceedings of the 2003 Users Group Conference (Bellevue, WA, June 9-13, 2003), UGC '03. IEEE Computer Society, Los Alamitos, CA, 332--336. Google ScholarDigital Library
Tracy, F. 2005. Role of algorithms in understanding performance of the TI-05 benchmark suite. In Proceedings of the HPCMP Users Group Conference 2005 (Nashville, TN, June 27-30, 2005), UGC '05. IEEE Computer Society, Los Alamitos, CA, 420--426. Google ScholarDigital Library

Index Terms

Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. System management
        Quality assurance

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
Read More
High Performance Computing via a GPU
ICISE '09: Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering

Graphics processor units (GPUs), such as the AMD FireStream series, offer a tremendous computing power that is frequently an order of magnitude larger than even the most modern multi-core CPUs, making them an attractive platform for high performance ...
Read More
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Intel has recently introduced Intel^® Transactional Synchronization Extensions (Intel^® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '11: State of the Practice Reports
November 2011
242 pages
ISBN:9781450311397
DOI:10.1145/2063348
Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
sustained systems performance
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 141
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program

SC '11: State of the Practice Reports

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

High Performance Computing via a GPU

Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sustained systems performance monitoring at the U. S. Department of Defense high performance computing modernization program

SC '11: State of the Practice Reports

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

High Performance Computing via a GPU

Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media