skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services

Conference ·

Microservices are a powerful new way of building, customizing, and deploying distributed services owing to their flexibility and maintainability. Several large-scale distributed platforms have emerged to serve the growing needs of data-centric workloads and services in commercial computing. Concurrently, high-performance computing (HPC) systems and software are rapidly evolving to meet the demands of diversified applications and heterogeneity. The interplay of hardware factors, software configuration parameters, and the flexibility offered with a microservice architecture makes it nontrivial to estimate the optimal service instantiation for a given application workload. Further, this problem is exacerbated when considering that these services operate in a dynamic and heterogeneous HPC environment. An optimally integrated service can be vastly more performant than a haphazardly integrated one. Existing performance tools for HPC either fail to understand the request-response model of communication inherent to microservices or they operate within a narrow scope, limiting the insight that can be gleaned from employing them in isolation. We propose a methodology for integrated performance analysis of HPC microservices frameworks and applications called SYMBIOSYS. We describe its design and implementation within the context of the Mochi framework. This integration is achieved by combining distributed callpath profiling and tracing with a performance data exchange strategy that collects fine-grained, low-level metrics from the RPC communication library and network layers. The result is a portable, low-overhead performance analysis setup that provides a holistic profile of the dependencies among microservices and how they interact with the Mochi RPC software stack. Using HEPnOS, a production-quality Mochi data service, we demonstrate the low-overhead operation of SYMBIOSYS at scale and use it to identify the root causes of poorly performing service configurations.

Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1863758
Resource Relation:
Conference: 35th IEEE International Parallel and Distributed Processing Symposium, 05/17/21 - 05/21/21, Portland, OR, US
Country of Publication:
United States
Language:
English

References (17)

The Spack package manager: bringing order to HPC software chaos
  • Gamblin, Todd; LeGendre, Matthew; Collette, Michael R.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807623
conference January 2015
An early prototype of an autonomic performance environment for exascale conference January 2013
Exploring the Capabilities of the New MPI_T Interface
  • Islam, Tanzima; Mohror, Kathryn; Schulz, Martin
  • EuroMPI/ASIA '14: 21st European MPI Users' Group Meeting, Proceedings of the 21st European MPI Users' Group Meeting https://doi.org/10.1145/2642769.2642781
conference September 2014
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU journal September 2018
Microservices: The Journey So Far and Challenges Ahead journal May 2018
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters
  • Weil, Sage A.; Leung, Andrew W.; Brandt, Scott A.
  • Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07 https://doi.org/10.1145/1374596.1374606
conference January 2007
Mochi: Composing Data Services for High-Performance Computing Environments journal January 2020
The Tau Parallel Performance System journal May 2006
LittleD: a SQL database for sensor nodes and embedded applications conference March 2014
GekkoFS - A Temporary Distributed File System for HPC Applications conference September 2018
Caliper: Performance Introspection for HPC Software Stacks
  • Boehme, David; Gamblin, Todd; Beckingsale, David
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.46
conference November 2016
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir book January 2012
Adaptive ensemble simulations of biomolecules journal October 2018
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research journal December 2018
PAPI software-defined events for in-depth performance analysis journal May 2019
Mercury: Enabling remote procedure call for high-performance computing conference September 2013
A Brief Introduction to the OpenFabrics Interfaces - A New Network API for Maximizing High Performance Application Efficiency conference August 2015

Similar Records

Mochi: Composing Data Services for High-Performance Computing Environments
Journal Article · Fri Jan 17 00:00:00 EST 2020 · Journal of Computer Science and Technology · OSTI ID:1863758

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale (V.2.0)
Technical Report · Fri Dec 16 00:00:00 EST 2022 · OSTI ID:1863758

Pufferscale: Rescaling HPC Data Services for High Energy Physics Applications
Conference · Wed Jan 01 00:00:00 EST 2020 · OSTI ID:1863758