Skip to main content

Illuminating the I/O Optimization Path of Scientific Applications

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)

Abstract

The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. CCPE 22(6), 685–701 (2010). https://doi.org/10.1002/cpe.1553

    Article  Google Scholar 

  2. Agarwal, M., Singhvi, D., Malakar, P., Byna, S.: Active learning-based automatic tuning and prediction of parallel I/O performance. In: 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), pp. 20–29 (2019). https://doi.org/10.1109/PDSW49588.2019.00007

  3. Bağbaba, A.: Improving collective I/o performance with machine learning supported auto-tuning. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 814–821 (2020). https://doi.org/10.1109/IPDPSW50202.2020.00138

  4. Behzad, B., Byna, S., Prabhat, Snir, M.: Optimizing I/O performance of HPC applications with autotuning. ACM Trans. Parallel Comput. 5(4) (2019). https://doi.org/10.1145/3309205

  5. Bez, J.L., Ather, H., Byna, S.: Drishti: guiding end-users in the I/O optimization journey. In: 2022 IEEE/ACM International Parallel Data Systems Workshop (PDSW), pp. 1–6 (2022). https://doi.org/10.1109/PDSW56643.2022.00006

  6. Bez, J.L., Boito, F.Z., Schnorr, L.M., Navaux, P.O.A., Méhaut, J.F.: TWINS: server access coordination in the I/O forwarding layer. In: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 116–123 (2017). https://doi.org/10.1109/PDP.2017.61

  7. Bez, J.L., Zanon Boito, F., Nou, R., Miranda, A., Cortes, T., Navaux, P.O.: Adaptive request scheduling for the I/O forwarding layer using reinforcement learning. Futur. Gener. Comput. Syst. 112, 1156–1169 (2020). https://doi.org/10.1016/j.future.2020.05.005

    Article  Google Scholar 

  8. Bez, J.L., et al.: I/O bottleneck detection and tuning: connecting the dots using interactive log analysis. In: 2021 IEEE/ACM 6th International Parallel Data Systems Workshop (PDSW), pp. 15–22 (2021). https://doi.org/10.1109/PDSW54622.2021.00008

  9. Boito, F.Z., Kassick, R.V., Navaux, P.O., Denneulin, Y.: AGIOS: application-guided I/O scheduling for parallel file systems. In: International Conference on Parallel and Distributed Systems, pp. 43–50 (2013). https://doi.org/10.1109/ICPADS.2013.19

  10. Carns, P., Kunkel, J., Mohror, K., Schulz, M.: Understanding I/O behavior in scientific and data-intensive computing (Dagstuhl Seminar 21332). Dagstuhl Rep. 11(7), 16–75 (2021). https://doi.org/10.4230/DagRep.11.7.16

    Article  Google Scholar 

  11. Carns, P., et al.: Understanding and improving computational science storage access through continuous characterization. ACM Trans. Storage 7(3) (2011). https://doi.org/10.1109/MSST.2011.5937212

  12. Carretero, J., et al.: Mapping and scheduling hpc applications for optimizing I/O. In: Proceedings of the 34th ACM International Conference on Supercomputing. ICS’20 (2020). https://doi.org/10.1145/3392717.3392764

  13. Darshan team: pyDarshan. https://github.com/darshan-hpc/darshan/tree/main/darshan-util/pydarshan

  14. Huebl, A., et al.: openPMD: a meta data standard for particle and mesh based data (2015). https://doi.org/10.5281/zenodo.1167843

  15. Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, TAU, and vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools High Perform. Comput., pp. 79–91. Springer, Berlin Heidelberg, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7

    Chapter  Google Scholar 

  16. Koller, F., et al.: openPMD-api: C++ & python API for scientific I/O with openPMD (2019). https://doi.org/10.14278/rodare.209

  17. Kousha, P., et al.: INAM: cross-stack profiling and analysis of communication in MPI-based applications. In: Practice and Experience in Advanced Research Computing (2021). DOIurl10.1145/3437359.3465582

    Google Scholar 

  18. Li, T., Byna, S., Koziol, Q., Tang, H., Bez, J.L., Kang, Q.: h5bench: HDF5 I/O kernel suite for exercising HPC I/O patterns. In: CUG (2021)

    Google Scholar 

  19. Li, Y., Bel, O., Chang, K., Miller, E.L., Long, D.D.E.: CAPES: unsupervised storage performance tuning using neural network-based deep reinforcement learning. In: SC’17 (2017). DOIurl10.1145/3126908.3126951

    Google Scholar 

  20. Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 819–829. IEEE (2016). https://doi.org/10.1109/SC.2016.69

  21. Lockwood, G.K., Wright, N.J., Snyder, S., Carns, P., Brown, G., Harms, K.: TOKIO on ClusterStor: connecting standard tools to enable holistic I/O performance analysis. CUG (2018). https://www.osti.gov/biblio/1632125

  22. Lockwood, G.K., et al.: UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis. In: PDSW-DISCS, p. 55–60 (2017). https://doi.org/10.1145/3149393.3149395

  23. Lockwood, G.K., et al.: A year in the life of a parallel file system. In: SC’18 (2018). https://doi.org/10.1109/SC.2018.00077

  24. Lofstead, J., et al.: Six degrees of scientific data: reading patterns for extreme scale science IO. In: HPDC’11, pp. 49–60. ACM, New York (2011). https://doi.org/10.1145/1996130.1996139

  25. Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: CLADE, pp. 15–24. ACM, NY (2008). https://doi.org/10.1145/1383529.1383533

  26. Nicolae, B., et al.: VeloC: towards high performance adaptive asynchronous checkpointing at large scale. In: IPDPS, pp. 911–920 (2019). https://doi.org/10.1109/IPDPS.2019.00099

  27. NVIDIA: Nsight systems. https://developer.nvidia.com/nsight-systems

  28. Pezoa, F., et al.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)

    Google Scholar 

  29. Shende, S., et al.: Characterizing I/O performance using the TAU performance system. In: ParCo 2011, Advances in Parallel Computing, vol. 22, pp. 647–655. IOS Press (2011). https://doi.org/10.3233/978-1-61499-041-3-647

  30. Snyder, S., et al.: Modular HPC I/O characterization with darshan. In: ESPT ’16, pp. 9–17. IEEE Press (2016). https://doi.org/10.1109/ESPT.2016.006

  31. Stovner, E.B., Sætrom, P.: PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36(3), 918–919 (2019). https://doi.org/10.1093/bioinformatics/btz615

    Article  Google Scholar 

  32. Sung, H., et al.: Understanding parallel I/o performance trends under various HPC configurations. In: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, pp. 29–36 (2019). https://doi.org/10.1145/3322798.3329258

  33. Tang, H., Koziol, Q., Byna, S., Mainzer, J., Li, T.: Enabling transparent asynchronous I/O using background threads. In: 2019 IEEE/ACM 4th International Parallel Data Systems Workshop (PDSW), pp. 11–19 (2019). https://doi.org/10.1109/PDSW49588.2019.00006

  34. Tang, H., Koziol, Q., Ravi, J., Byna, S.: Transparent asynchronous parallel I/O using background threads. IEEE TPDS 33(4), 891–902 (2022). https://doi.org/10.1109/TPDS.2021.3090322

    Article  Google Scholar 

  35. Taufer, M.: AI4IO: a suite of Ai-based tools for IO-aware HPC resource management. In: HiPC, pp. 1–1 (2021). https://doi.org/10.1109/HiPC53243.2021.00012

  36. Tavakoli, N., Dai, D., Chen, Y.: Log-assisted straggler-aware I/O scheduler for high-end computing. In: 2016 45th International Conference on Parallel Processing Workshops (ICPPW), pp. 181–189 (2016). https://doi.org/10.1109/ICPPW.2016.38

  37. Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings Frontiers ’99 7th Symposium on the Frontiers of Massively Parallel Computation, pp. 182–189 (1999). https://doi.org/10.1109/FMPC.1999.750599

  38. The HDF Group: Hierarchical data format, version 5 (1997). http://www.hdfgroup.org/HDF5

  39. The pandas Development Team: pandas-dev/pandas: Pandas (2020). https://doi.org/10.5281/zenodo.3509134

  40. Wang, C., et al.: Recorder 2.0: efficient parallel I/O tracing and analysis. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1–8 (2020). https://doi.org/10.1109/IPDPSW50202.2020.00176

  41. Wang, T., et al.: A zoom-in analysis of I/O logs to detect root causes of I/O performance bottlenecks. In: CCGRID, pp. 102–111 (2019). https://doi.org/10.1109/CCGRID.2019.00021

  42. Wang, T., et al.: IOMiner: large-scale analytics framework for gaining knowledge from I/O Logs. In: IEEE CLUSTER, pp. 466–476 (2018). https://doi.org/10.1109/CLUSTER.2018.00062

  43. Wilkinson, L.: The Grammar of Graphics (Statistics and Computing). Springer-Verlag, Berlin (2005)

    MATH  Google Scholar 

  44. Xu, C., et al.: DXT: darshan eXtended tracing. CUG (2019)

    Google Scholar 

  45. Yildiz, O., et al.: On the root causes of cross-application I/O interference in HPC storage systems. In: IEEE IPDPS, pp. 750–759 (2016). https://doi.org/10.1109/IPDPS.2016.50

  46. Yu, J., Liu, G., Dong, W., Li, X., Zhang, J., Sun, F.: On the load imbalance problem of I/O forwarding layer in HPC systems. In: International Conference on Computer and Communications (ICCC), pp. 2424–2428 (2017). https://doi.org/10.1109/CompComm.2017.8322970

  47. Zhang, W., et al.: AMReX: block-structured adaptive mesh refinement for multiphysics applications. Int. J. High Perform. Comput. Appl. 35(6), 508–526 (2021). https://doi.org/10.1177/10943420211022811

    Article  Google Scholar 

Download references

Acknowledgment

This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research was also supported by The Ohio State University under a subcontract (GR130303), which was supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research (ASCR) under contract number DE-AC02-05CH11231 with LBNL. This research used resources of the National Energy Research Scientific Computing Center under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Luca Bez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ather, H., Bez, J.L., Norris, B., Byna, S. (2023). Illuminating the I/O Optimization Path of Scientific Applications. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-32041-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32040-8

  • Online ISBN: 978-3-031-32041-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics