Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance

Chen, Qi; Chen, Kang; Chen, Zuo-Ning; Xue, Wei; Ji, Xu; Yang, Bin

doi:10.1007/s11390-020-9798-5

Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance

Regular Paper
Published: 17 January 2020

Volume 35, pages 47–60, (2020)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Qi Chen¹,
Kang Chen¹,
Zuo-Ning Chen²,
Wei Xue¹,
Xu Ji^1,3 &
…
Bin Yang^3,4

260 Accesses
6 Citations
Explore all metrics

Abstract

It is hard for applications to make full utilization of the peak bandwidth of the storage system in highperformance computers because of I/O interferences, storage resource misallocations and complex long I/O paths. We performed several studies to bridge this gap in the Sunway storage system, which serves the supercomputer Sunway TaihuLight. To locate these issues and connections between them, an end-to-end performance monitoring and diagnosis tool was developed to understand I/O behaviors of applications and the system. With the help of the tool, we were about to find out the root causes of such performance barriers at the I/O forwarding layer and the parallel file system layer. An application-aware I/O forwarding allocation framework was used to address the I/O interferences and resource misallocations at the I/O forwarding layer. A performance-aware data placement mechanism was proposed to mitigate the impact of I/O interferences and performance variations of storage devices in the PFS. Together, applications obtained much better I/O performance. During the process, we also proposed a lightweight storage stack to shorten the I/O path of applications with -N I/O pattern. This paper summarizes these studies and presents the lessons learned from the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and evaluation of a user-level file system for fast storage devices

Article 23 July 2015

Research Characterization on I/O Improvements of Storage Environments

User-Space I/O for $$\mu $$ s-level Storage Devices

References

Vishwanath V, Hereld M, Iskra K, Kimpe D, Morozov V, Papka M E, Ross R, Yoshii K. Accelerating I/O forwarding in IBM Blue Gene/P systems. In Proc. the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2010, Article No. 34.
Ohta K, Kimpe D, Cope J, Iskra K, Ross R, Ishikawa Y. Optimization techniques at the I/O forwarding layer. In Proc. the 2010 IEEE International Conference on Cluster Computing, September 2010, pp.312-321.
Ali N, Carns P, Iskra K, Kimpe D, Lang S, Latham R, Ross R, Ward L, Sadayappan P. Scalable I/O forwarding framework for high-performance computing systems. In Proc. the 2009 IEEE International Conference on Cluster Computing and Workshops, August 2009, Article No. 10.
Schwan P. Lustre: Building a file system for 1000-node clusters. In Proc. the 2003 Linux Symposium, July 2003, pp.380-386.
Lin H, Zhu X, Yu B, Tang X, Xue W, Chen W, Zhang L, Hoefler T, Ma X, Liu X. ShenTu: Processing multi-trillion edge graphs on millions of cores in seconds. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 56.
Yang B, Ji X, Ma X et al. End-to-end I/O monitoring on a leading supercomputer. In Proc. the 16th USENIX Symposium on Networked Systems Design and Implementation, February 2019, pp.379-394.
Yildiz O, Dorier M, Ibrahim S, Ross R, Antoniu G. On the root causes of cross-application I/O interference in HPC storage systems. In Proc. the 2016 IEEE International Parallel and Distributed Processing Symposium, May 2016, pp.750-759.
Gainaru A, Aupy G, Benoit A, Cappello F, Robert Y, Snir M. Scheduling the I/O of HPC applications under congestion. In Proc. the 2015 IEEE International Parallel and Distributed Processing Symposium, May 2015, pp.1013-1022.
Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103-111.
Article Google Scholar
Ji X, Yang B, Zhang T, Ma X, Zhu X, Wang X, El-Sayed N, Zhai J, Liu W, Xue W. Automatic, application-aware I/O forwarding resource allocation. In Proc. the 17th USENIX Conference on File and Storage Technologies, February 2019, pp.265-279.
Gunawi H S, Suminto R O, Sears R et al. Fail-slow at scale: Evidence of hardware performance faults in large production systems. ACM Transactions on Storage, 2018, 14(3): Article No. 23.
Article Google Scholar
Djordjevic B, Timcenko V. Ext4 file system performance analysis in Linux environment. In Proc. the 11th WSEAS International Conference on Applied Informatics and Communications, August 2011, pp.288-293.
Lofstead J, Zheng F, Liu Q, Klasky S, Oldfield R, Kordenbrock T, Schwan K, Wolf M. Managing variability in the IO performance of petascale storage systems. In Proc. the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2010, Article No. 35.
Kim Y, Gunasekaran R. Understanding I/O workload characteristics of a Peta-scale storage system. The Journal of Supercomputing, 2015, 71(3): 761-780.
Article Google Scholar
Dillow D A, Shipman G M, Oral S, Zhang Z, Kim Y. Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems. In Proc. the 30th IEEE International Performance Computing and Communications Conference, November 2011, Article No. 6.
Lockwood G K, Snyder S,Wang T, Byna S, Carns P, Wright N J. A year in the life of a parallel file system. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 74.
Shipman G, Dillow D, Oral S, Wang F, Fuller D, Hill J, Zhang Z. Lessons learned in deploying the world’s largest scale Lustre file system. In Proc. the 2010 Cray User Group Conference, May 2010.
Bent J, Gibson G, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M, Wingate M. PLFS: A checkpoint filesystem for parallel applications. In Proc. the 2009 Conference on High Performance Computing Networking, Storage and Analysis, November 2009, Article No. 26.
Liao W K, Choudhary A. Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In Proc. the 2008 ACM/IEEE Conference on Supercomputing, November 2008, Article No. 3.
Shan H, Antypas K, Shalf J. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proc. the 2008 ACM/IEEE Conference on Supercomputing, November 2008, Article No. 42.
Liu Y, Gunasekaran R, Ma X, Vazhkudai S S. Automatic identification of application I/O signatures from noisy server-side traces. In Proc. the 12th USENIX Conference on File and Storage Technologies, February 2014, pp.213-228.
Carns P H, Latham R, Ross R B, Iskra K, Lang S, Riley K. 24/7 characterization of petascale I/O workloads. In Proc. the International Conference on Cluster Computing, August 2009, Article No. 75.
Conway A, Bakshi A, Jiao Y, Jannen W, Zhan Y, Yuan J, Bender M A, Johnson R, Kuszmaul B C, Porter D E, Farach-Colton M. File systems fated for senescence? Nonsense, says science! In Proc. the 15th USENIX Conference on File and Storage Technologies, February 2017, pp.45-58.
Awerbuch B, Scheideler C. Towards a scalable and robust DHT. Theory of Computing Systems, 2009, 45(2): 234-260.
Article MathSciNet Google Scholar
Qian Y, Li X, Ihara S, Zeng L, Kaiser J, Süß T, Brinkmann A. A configurable rule based classful token bucket filter network request scheduler for the Lustre file system. In Proc. the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017, Article No. 6.
Weil S A, Brandt S A, Miller E L, Maltzahn C. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proc. the 2006 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2016, Article No. 122.
Egwutuoha I P, Levy D, Selic B, Chen S. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. The Journal of Supercomputing, 2013, 65(3): 1302-1326.
Article Google Scholar
Artiaga E, Cortes T. Using filesystem virtualization to avoid metadata bottlenecks. In Proc. the 2010 Design, Automation & Test in Europe Conference & Exhibition, March 2010, pp.562-567.
FringsW, Wolf F, Petkov V. Scalable massively parallel I/O to task-local files. In Proc. the 2009 Conference on High Performance Computing Networking, Storage and Analysis, November 2009, Article No. 22.
Wang T, Mohror K, Moody A, Sato K, YuW. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2016, pp.807-818.
Vef MA, Moti N, Süß T, Tocci T, Nou R, Miranda A, Cortes T, Brinkmann A. GekkoFS—A temporary distributed file system for HPC applications. In Proc. the 2018 IEEE International Conference on Cluster Computing, September 2018, pp.319-324.
Zheng Q, Cranor C D, Guo D, Ganger G R, Amvrosiadis G, Gibson G A, Settlemyer BW, Grider G, Guo F. Scaling embedded in-situ indexing with deltaFS. In Proc. the 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018, Article No. 3.
Iskra K, Romein J W, Yoshii K, Beckman P. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proc. the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, February 2008, pp.153-162.
Schmuck F B, Haskin R L. GPFS: A shared-disk file system for large computing clusters. In Proc. the 2002 USENIX Conference on File and Storage Technologies, January 2002, pp.231-244.
Grandl R, Kandula S, Rao S, Akella A, Kulkarni J. GRAPHENE: Packing and dependency-aware scheduling for data-parallel clusters. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.81-97.
Zhou A C, Xiao Y, He B, Ibrahim S, Cheng R. Incorporating probabilistic optimizations for resource provisioning of data processing workflows. In Proc. the 48th International Conference on Parallel Processing, August 2019, Article No. 6.
Grandl R, Chowdhury M, Akella A, Ananthanarayanan G. Altruistic scheduling in multi-resource clusters. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.65-80.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, 100084, China
Qi Chen, Kang Chen, Wei Xue & Xu Ji
Chinese Academy of Engineering, Beijing, 100088, China
Zuo-Ning Chen
National Supercomputing Center in Wuxi, Wuxi, 214100, China
Xu Ji & Bin Yang
School of Software, Shandong University, Jinan, 250101, China
Bin Yang

Authors

Qi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zuo-Ning Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xu Ji
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Chen.

Electronic supplementary material

ESM 1

(PDF 301 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Q., Chen, K., Chen, ZN. et al. Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance. J. Comput. Sci. Technol. 35, 47–60 (2020). https://doi.org/10.1007/s11390-020-9798-5

Download citation

Received: 30 July 2019
Revised: 28 November 2019
Published: 17 January 2020
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11390-020-9798-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance

Abstract

Access this article

Similar content being viewed by others

Design and evaluation of a user-level file system for fast storage devices

Research Characterization on I/O Improvements of Storage Environments

User-Space I/O for $$\mu $$ s-level Storage Devices

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O Performance

Abstract

Access this article

Similar content being viewed by others

Design and evaluation of a user-level file system for fast storage devices

Research Characterization on I/O Improvements of Storage Environments

User-Space I/O for $$\mu $$ s-level Storage Devices

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation