skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System

Abstract

With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1474694
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) - Atlanta, Georgia, United States of America - 6/5/2017 8:00:00 AM-8/8/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Wan, Lipeng, Wolf, Matthew, Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, and Klasky, Scott. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System. United States: N. p., 2017. Web. doi:10.1109/ICDCS.2017.257.
Wan, Lipeng, Wolf, Matthew, Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, & Klasky, Scott. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System. United States. https://doi.org/10.1109/ICDCS.2017.257
Wan, Lipeng, Wolf, Matthew, Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, and Klasky, Scott. 2017. "Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System". United States. https://doi.org/10.1109/ICDCS.2017.257. https://www.osti.gov/servlets/purl/1474694.
@article{osti_1474694,
title = {Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System},
author = {Wan, Lipeng and Wolf, Matthew and Wang, Feiyi and Choi, Jong Youl and Ostrouchov, George and Klasky, Scott},
abstractNote = {With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.},
doi = {10.1109/ICDCS.2017.257},
url = {https://www.osti.gov/biblio/1474694}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jun 01 00:00:00 EDT 2017},
month = {Thu Jun 01 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Characterizing output bottlenecks in a supercomputer
conference, November 2012

  • Xie, Bing; Chase, Jeffrey; Dillow, David
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2012.28

Heavy-tailed distribution of parallel I/O system response time
conference, January 2015


The Gemini System Interconnect
conference, August 2010

  • Alverson, Robert; Roweth, Duncan; Kaplan, Larry
  • 2010 IEEE 18th Annual Symposium on High-Performance Interconnects (HOTI), 2010 18th IEEE Symposium on High Performance Interconnects
  • https://doi.org/10.1109/HOTI.2010.23

A multi-level approach for understanding I/O activity in HPC applications
conference, September 2013


AN OVERVIEW OF THE OMNeT++ SIMULATION ENVIRONMENT
conference, January 2008

  • Varga, András; Hornig, Rudolf
  • 1st International ICST Conference on Simulation Tools and Techniques for Communications, Networks and Systems, Proceedings of the First International ICST Conference on Simulation Tools and Techniques for Communications Networks and Systems
  • https://doi.org/10.4108/ICST.SIMUTOOLS2008.3027

New techniques for simulating high performance MPI applications on large storage networks
journal, March 2009


Comparative I/O workload characterization of two leadership class storage clusters
conference, January 2015


Towards Exploring Data-Intensive Scientific Applications at Extreme Scales through Systems and Simulations
journal, June 2016


I/O performance challenges at leadership scale
conference, January 2009


Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems
conference, November 2014

  • Oral, Sarp; Simmons, James; Hill, Jason
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2014.23

24/7 Characterization of petascale I/O workloads
conference, August 2009


A Multiplatform Study of I/O Behavior on Petascale Supercomputers
conference, January 2015

  • Luu, Huong; Winslett, Marianne; Gropp, William
  • Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
  • https://doi.org/10.1145/2749246.2749269

Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010

  • Lofstead, Jay; Zheng, Fang; Liu, Qing
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.32