Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets

Malakar, Preeti; Vishwanath, Venkatram

doi:10.1007/s10766-015-0388-z

Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets

Published: 06 October 2015

Volume 45, pages 94–108, (2017)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Preeti Malakar¹ &
Venkatram Vishwanath¹

377 Accesses
3 Citations
Explore all metrics

Abstract

Large-scale scientific applications spend a significant amount of time in reading and writing data. These simulations run on supercomputers which are architected with high-bandwidth, low-latency, and complex topology interconnects. Yet, few efforts exist that fully exploit the interconnect features for I/O. MPI-IO optimizations suffer from significant network contention at large core counts making I/O a critical bottleneck at extreme scales. We propose HieRO, which leverages the fast interconnect and performs hierarchical optimizations for I/O in scientific applications with structured datasets. HieRO performs reads/writes in multiple stages using carefully chosen leader processes who invoke the MPI-IO calls. Additionally, HieRO considers the application’s domain decomposition and access patterns and fully utilizes the on-chip interconnect at each multicore node. We evaluate the efficacy of our optimizations with two scientific applications, WRF and S3D, with I/O access patterns commonly used in a wide gamut of applications. We evaluate our approaches on two supercomputers, the Edison Cray XC30 and the Mira Blue Gene/Q, representing systems with diverse interconnects and parallel filesystems. We demonstrate that algorithmic changes can lead to significant improvements in parallel read/write. HieRO is able to achieve more than \(40\times \) read time improvements for WRF and achieve up to \(40\times \) read and \(13\times \) write time improvements for S3D on 524288 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Article 17 January 2020

Suren Byna, M. Scot Breitenfeld, … Richard Warren

Performance Evaluation of Spark, Ray and MPI: A Case Study on Long Read Alignment Algorithm

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems

Notes

https://bitbucket.org/pmalakar/rw-benchmark.

References

Arimilli, B., et al.: The PERCS high-performance interconnect. In: Annual Symposium on High Performance Interconnects (2010)
Behzad, B., et al.: Improving parallel I/O autotuning with performance modeling. In: International Symposium on High-Performance Parallel and Distributed Computing (2014)
Chaarawi, M., Gabriel, E.: Automatically selecting the number of aggregators for collective I/O operations. In: International Conference on Cluster Computing, pp. 428–437 (2011)
Chen, D., et al.: The IBM Blue Gene/Q interconnection network and message unit. In: Proceedings of the IEEE/ACM SC11 Conference
Chen, J.H., et al.: Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2(1), 015001 (2009)
Coloma, K., Ching, A., Choudhary, A., Liao, W., Ross, R., Thakur, R., Ward, N.L.: A new flexible MPI collective I/O implementation. In: International Conference on Cluster Computing (2006)
Crandall, P., Aydt, R., Chien, A., Reed, D.: Input/output characteristics of scalable parallel applications. In: Proceedings of the IEEE/ACM SC95 Conference
Edwards, T., Roy, K.: Using I/O servers to improve application performance on Cray XT™ technology. In: CUG Proceedings (2010)
Gao, K., Liao, W., Choudhary, A., Ross, R., Latham, R.: Combining I/O operations for multiple array variables in parallel netCDF. In: International Conference on Cluster Computing and Workshops (2009)
Gilge, M.: IBM system blue gene solution: Blue Gene/Q application development. IBM Redbooks (2013)
Haring, R., et al.: The IBM Blue Gene/Q compute chip. IEEE Micro 32(2), 48–60 (2012)
Article Google Scholar
Hurrell, J.W., et al.: The community earth system model: a framework for collaborative research. Bull. Am. Meteorol. Soc. 94(9), 1339–1360 (2013)
Article Google Scholar
Kim, J., Dally, W., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: 35th International Symposium on Computer Architecture (2008)
Lang, S., Carns, P., Latham, R., Ross, R., Harms, K., Allcock, W.: I/O performance challenges at leadership scale. In: Proceedings of the IEEE/ACM SC09 Conference (2009)
Li, J., et al.: Parallel netCDF: A high-performance scientific I/O interface. In: Proceedings of the IEEE/ACM SC03 Conference (2003)
Liao, W., Thakur, R.: MPI-IO. (ANL/MCS-P5162-0714) (2014)
Liu, J., Chen, Y., Zhuang, Y.: Hierarchical I/O scheduling for collective I/O. In: International Symposium on Cluster, Cloud and Grid Computing, pp. 211–218 (2013)
Michalakes, J., et al.: The weather research and forecast model: software architecture and performance. In: Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology (2004)
Prabhat, Koziol, Q., (eds.): High Performance Parallel I/O, vol. 22. Chapman & Hall/CRC Computational Science, CRC Press, Boca Raton (2014)
Schmuck, F., Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (2002)
Schwan, P.: Lustre, Building a file system for 1000 node clusters. In: Proceedings of Linux Symposium (2003)
Sehrish, S., Son, S., Liao, W., Choudhary, A., Schuchardt, K.: Improving collective I/O performance by pipelining request aggregation and file access. In: Proceedings of the 20th European MPI Users’ Group Meeting (2013)
Sreepathi, S., Sripathi, V., Mills, R., Hammond, G., Mahinthakumar, G.: SCORPIO: a scalable two-phase parallel I/O library with application to a large scale subsurface simulator. In: International Conference on High Performance Computing (2013)
Thakur, R., Gropp, W., Lusk, E.: Optimizing noncontiguous accesses in MPI-IO. Parallel Comput. 28(1), 83–105 (2002)
Article MATH Google Scholar
Venkatesan, V., et al.: Design and evaluation of nonblocking collective I/O operations. In: Recent Advances in the Message Passing Interface (EuroMPI’11), pp. 90–98 (2011)
Vishwanath, V., Hereld, M., Morozov, V., Papka, M.E.: Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems. In: Proceedings of the IEEE/ACM SC11 Conference (2011)
Wang, Z., Shi, X., Jin, H., Wu, S., Chen, Y.: Iteration based collective I/O strategy for parallel I/O systems. In: International Symposium on Cluster, Cloud and Grid Computing (2014)

Download references

Acknowledgments

This research has been funded in part and used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Argonne National Laboratory, Lemont, IL, USA
Preeti Malakar & Venkatram Vishwanath

Authors

Preeti Malakar
View author publications
You can also search for this author in PubMed Google Scholar
Venkatram Vishwanath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Preeti Malakar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malakar, P., Vishwanath, V. Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets. Int J Parallel Prog 45, 94–108 (2017). https://doi.org/10.1007/s10766-015-0388-z

Download citation

Received: 30 March 2015
Accepted: 20 May 2015
Published: 06 October 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10766-015-0388-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets

Abstract

Access this article

Similar content being viewed by others

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Performance Evaluation of Spark, Ray and MPI: A Case Study on Long Read Alignment Algorithm

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets

Abstract

Access this article

Similar content being viewed by others

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Performance Evaluation of Spark, Ray and MPI: A Case Study on Long Read Alignment Algorithm

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation