Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System
Abstract
With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.
- Authors:
-
- ORNL
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1474694
- DOE Contract Number:
- AC05-00OR22725
- Resource Type:
- Conference
- Resource Relation:
- Conference: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) - Atlanta, Georgia, United States of America - 6/5/2017 8:00:00 AM-8/8/2017 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Wan, Lipeng, Wolf, Matthew, Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, and Klasky, Scott. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System. United States: N. p., 2017.
Web. doi:10.1109/ICDCS.2017.257.
Wan, Lipeng, Wolf, Matthew, Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, & Klasky, Scott. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System. United States. https://doi.org/10.1109/ICDCS.2017.257
Wan, Lipeng, Wolf, Matthew, Wang, Feiyi, Choi, Jong Youl, Ostrouchov, George, and Klasky, Scott. 2017.
"Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System". United States. https://doi.org/10.1109/ICDCS.2017.257. https://www.osti.gov/servlets/purl/1474694.
@article{osti_1474694,
title = {Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System},
author = {Wan, Lipeng and Wolf, Matthew and Wang, Feiyi and Choi, Jong Youl and Ostrouchov, George and Klasky, Scott},
abstractNote = {With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.},
doi = {10.1109/ICDCS.2017.257},
url = {https://www.osti.gov/biblio/1474694},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jun 01 00:00:00 EDT 2017},
month = {Thu Jun 01 00:00:00 EDT 2017}
}
Works referenced in this record:
Characterizing output bottlenecks in a supercomputer
conference, November 2012
- Xie, Bing; Chase, Jeffrey; Dillow, David
- 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Heavy-tailed distribution of parallel I/O system response time
conference, January 2015
- Dong, Bin; Byna, Surendra; Wu, Kesheng
- Proceedings of the 10th Parallel Data Storage Workshop on - PDSW '15
The Gemini System Interconnect
conference, August 2010
- Alverson, Robert; Roweth, Duncan; Kaplan, Larry
- 2010 IEEE 18th Annual Symposium on High-Performance Interconnects (HOTI), 2010 18th IEEE Symposium on High Performance Interconnects
A multi-level approach for understanding I/O activity in HPC applications
conference, September 2013
- Luu, Huong; Behzad, Babak; Aydt, Ruth
- 2013 IEEE International Conference on Cluster Computing (CLUSTER)
AN OVERVIEW OF THE OMNeT++ SIMULATION ENVIRONMENT
conference, January 2008
- Varga, András; Hornig, Rudolf
- 1st International ICST Conference on Simulation Tools and Techniques for Communications, Networks and Systems, Proceedings of the First International ICST Conference on Simulation Tools and Techniques for Communications Networks and Systems
New techniques for simulating high performance MPI applications on large storage networks
journal, March 2009
- Núñez, Alberto; Fernández, Javier; Garcia, Jose D.
- The Journal of Supercomputing, Vol. 51, Issue 1
Comparative I/O workload characterization of two leadership class storage clusters
conference, January 2015
- Gunasekaran, Raghul; Oral, Sarp; Hill, Jason
- Proceedings of the 10th Parallel Data Storage Workshop on - PDSW '15
Towards Exploring Data-Intensive Scientific Applications at Extreme Scales through Systems and Simulations
journal, June 2016
- Zhao, Dongfang; Liu, Ning; Kimpe, Dries
- IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 6
I/O performance challenges at leadership scale
conference, January 2009
- Lang, Samuel; Carns, Philip; Latham, Robert
- Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems
conference, November 2014
- Oral, Sarp; Simmons, James; Hill, Jason
- SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
24/7 Characterization of petascale I/O workloads
conference, August 2009
- Carns, Philip; Latham, Robert; Ross, Robert
- 2009 IEEE International Conference on Cluster Computing and Workshops
A Multiplatform Study of I/O Behavior on Petascale Supercomputers
conference, January 2015
- Luu, Huong; Winslett, Marianne; Gropp, William
- Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15
Managing Variability in the IO Performance of Petascale Storage Systems
conference, November 2010
- Lofstead, Jay; Zheng, Fang; Liu, Qing
- 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis