Skip to main content

Application IO Analysis with Lustre Monitoring Using LASSi for ARCHER

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12321))

Included in the following conference series:

  • 1434 Accesses

Abstract

Supercomputers today have to support a complex workload with new Big Data and AI workloads adding to the more traditional HPC ones. It is important that we understand these workloads which constitute a mix of applications from different domains with different IO requirements. In some cases these applications place significant stress on the filesystem and may impact other applications making use of the shared resource. Today, ARCHER, the UK National Supercomputing service supports a diverse range of applications such as Climate Modelling, Bio-molecular Simulation, Material Science and Computational Fluid Dynamics. We will describe LASSi, a framework developed by the ARCHER Centre of Excellence to analyse application slowdown and IO usage on the shared (Lustre) filesystem.

LASSi combines application job information from the scheduler with Lustre IO monitoring statistics to construct the IO profile of applications interacting with the filesystem. We show how the metric-based, application-centric approach taken by LASSi was used both to understand application contention and reveal interesting aspects of the IO on ARCHER. In this paper we concentrate on new analysis of years of data collected from the ARCHER system. We study the general IO usage and trends in different ARCHER projects. We highlight how different application groups interact with the filesystem by building a metric based IO profile. This IO analysis of projects and applications enables project managers, HPC administrators, Application developers and Scientist to not only understand IO requirements but also plan for future. This information can be further used for reengineering applications, resource allocation planning and filesystem sizing for future systems.

Supported by EPRSC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.archer.ac.uk/documentation/safe-guide.

  2. 2.

    https://www.archer.ac.uk/community/consortia.

  3. 3.

    https://www.cray.com/products/storage/clusterstor/view.

References

  1. Adcroft, A., et al.: MITgcm user manual. Massachusetts Institute of Technology (2008)

    Google Scholar 

  2. Blum, V., et al.: Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 180(11), 2175–2196 (2009)

    Article  Google Scholar 

  3. Braam, P.J., et al.: The Lustre storage architecture. White Paper, Cluster File Systems Inc., October 23 (2003)

    Google Scholar 

  4. Brauner, T., Jones, W., Marquis, A.: LES of the Cambridge stratified swirl burner using a sub-grid pdf approach. Flow Turbul. Combust. 96(4), 965–985 (2016). https://doi.org/10.1007/s10494-016-9719-4

    Article  Google Scholar 

  5. Brown, A., Milton, S., Cullen, M., Golding, B., Mitchell, J., Shelly, A.: Unified modeling and prediction of weather and climate: a 25-year journey. Bull. Am. Meteorol. Soc. 93(12), 1865–1877 (2012). https://doi.org/10.1175/BAMS-D-12-00018.1

    Article  Google Scholar 

  6. Clark, S.J., et al.: First principles methods using castep. Zeitschrift für Kristallographie-Cryst. Mater. 220(5/6), 567–570 (2005)

    Google Scholar 

  7. Cray: Cray XC series supercomputers. https://www.cray.com/products/computing/xc-series (2018)

  8. Dewhurst, K., et al.: The elk fp-lapw code. ELK. http://elk.sourceforge.net (2016)

  9. Jasak, H., Jemcov, A., Tukovic, Z., et al.: OpenFOAM: a C++ library for complex physics simulations. In: International Workshop on Coupled Methods in Numerical Dynamics, vol. 1000, pp. 1–20. IUC Dubrovnik Croatia (2007)

    Google Scholar 

  10. Jenkins, K.W., Cant, R.S.: Direct numerical simulation of turbulent flame kernels. In: Knight, D., Sakell, L. (eds.) Recent Advances in DNS and LES. Fluid Mechanics and its Applications, vol. 54, pp. 191–202. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-011-4513-8_17

  11. Laizet, S., Li, N.: Incompact3d: a powerful tool to tackle turbulence problems with up to o (105) computational cores. Int. J. Numer. Methods Fluids 67(11), 1735–1757 (2011)

    Article  Google Scholar 

  12. Langer, Steven H., Karlin, Ian, Marinak, Michael M.: Performance characteristics of HYDRA – a multi-physics simulation code from LLNL. In: Daydé, Michel, Marques, Osni, Nakajima, Kengo (eds.) VECPAR 2014. LNCS, vol. 8969, pp. 173–181. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17353-5_15

    Chapter  Google Scholar 

  13. Leng, K., Nissen-Meyer, T., Van Driel, M., Hosseini, K., Al-Attar, D.: AxiSEM3D: broad-band seismic wavefields in 3-D global earth models with undulating discontinuities. Geophys. J. Int. 217(3), 2125–2146 (2019)

    Article  Google Scholar 

  14. Meurdesoif, Y.: Xios: an efficient and highly configurable parallel output library for climate modeling. In: The Second Workshop on Coupling Technologies for Earth System Models (2013)

    Google Scholar 

  15. Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. Technical report, Sandia National Labs., Albuquerque, NM (United States) (1993)

    Google Scholar 

  16. Sivalingam, K., Richardson, H., Tate, A., Lafferty, M.: LASSi: metric based I/O analytics for HPC. In: SCS Spring Simulation Multi-Conference (SpringSim 2019), Tucson, AZ, USA (2019)

    Google Scholar 

  17. Turner, A., Sloan-Murphy, D., Sivalingam, K., Richardson, H., Kunkel, J.M.: Analysis of parallel I/O use on the UK national supercomputing service, ARCHER using cray LASSi and EPCC SAFE. CoRR abs/1906.03891 (2019). http://arxiv.org/abs/1906.03891

  18. Valiev, M., et al.: NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput. Phys. Commun. 181(9), 1477–1489 (2010)

    Article  Google Scholar 

  19. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664

    Article  Google Scholar 

Download references

Acknowledgement

This work used the ARCHER UK National Supercomputing Service. We would like to acknowledge EPSRC, EPCC, Cray-HPE, the ARCHER helpdesk and user community for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karthee Sivalingam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sivalingam, K., Richardson, H. (2020). Application IO Analysis with Lustre Monitoring Using LASSi for ARCHER. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12321. Springer, Cham. https://doi.org/10.1007/978-3-030-59851-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59851-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59850-1

  • Online ISBN: 978-3-030-59851-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics