Skip to main content

Data Center Facility Monitoring with Physics Aware Approach

  • Conference paper
  • First Online:
High Performance Computing. ISC High Performance 2022 International Workshops (ISC High Performance 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13387))

Included in the following conference series:

  • 714 Accesses

Abstract

U.S. Department of Energy’s National Renewable Energy Laboratory (NREL) hosts one of the world’s most energy-efficient HPC data centers; this system uses component-level warm-water liquid cooling to efficiently remove heat from the data center and capture it for reuse in the building or rejection to the atmosphere. Given the complexity of this system, building data-driven tools for holistically monitoring and operating the entire data center is a priority for ensuring maximal efficiency and resiliency. In this advanced smart facility, over one million metrics are recorded per minute using state-of-the-art streaming data architecture and software to capture and process the state of the system in real time. Here we detail two efforts to effectively analyze, visualize, and interpret this large volume streaming data. We have developed a novel, flexible system for identifying and visualizing individual metric anomalies and component performance across the data center through automatic metadata extraction and physically-motivated visualization for quick interpretation. Additionally, to directly connect system maintenance to data stream processing we explore a physics informed multi-metric drift and anomaly detection application to detect scale-build up in heat exchangers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Eagle system configuration. https://www.nrel.gov/hpc/eagle-system-configuration.html

  2. NREL, 2018: NREL garners top sustainability honor at data center dynamics awards. Technical report, National Renewable Energy Laboratory (2018)

    Google Scholar 

  3. Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems. Eng. Appl. Artif. Intell. 85, 634–644 (2019). https://doi.org/10.1016/j.engappai.2019.07.008, https://www.sciencedirect.com/science/article/pii/S0952197619301721

  4. Bortot, L., Nardelli, W., Seto, P.: Data centers are a software development challenge. In: 48th Annual International Conference on Parallel Processing, pp. 1–5 (2019)

    Google Scholar 

  5. Demirbaga, U., et al.: AutoDiagn: an automated real-time diagnosis framework for big data systems. IEEE Trans. Comput. 71(5), 1035–1048 (2022). https://doi.org/10.1109/TC.2021.3070639

    Article  MATH  Google Scholar 

  6. Guan, Q., Fu, S.: Adaptive anomaly identification by exploring metric subspace in cloud computing infrastructures. In: 2013 IEEE 32nd International Symposium on Reliable Distributed Systems, pp. 205–214. IEEE (2013)

    Google Scholar 

  7. Sickinger, D., Geet, O.V., Belmont, S., Carter, T., Martinez, D.: Thermosyphon cooler hybrid system for water savings in an energy-efficient HPC data center: results from 24 months and impact on water usage effectiveness. Technical report NREL/TP-2C00-72196, National Renewable Energy Laboratory, September 2018

    Google Scholar 

  8. Tuncer, O., et al.: Diagnosing performance variations in HPC applications using machine learning. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds.) ISC High Performance 2017. LNCS, vol. 10266, pp. 355–373. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58667-0_19

    Chapter  Google Scholar 

  9. Tuncer, O., et al.: Online diagnosis of performance variation in HPC systems using machine learning. IEEE Trans. Parallel Distrib. Syst. 30(4), 883–896 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was authored in part by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by U.S. Department of Energy Office of Energy Efficiency and Renewable Energy and Hewlett-Packard Enterprise.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hilary Egan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Egan, H., Purkayastha, A., Sickinger, D. (2022). Data Center Facility Monitoring with Physics Aware Approach. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23220-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23219-0

  • Online ISBN: 978-3-031-23220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics