skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Orchestration of materials science workflows for heterogeneous resources at large scale

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [1];  [3];  [3];  [4];  [5];  [2];  [2]; ORCiD logo [1]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. Univ. of Utah, Salt Lake City, UT (United States)
  3. Idaho National Laboratory (INL), Idaho Falls, ID (United States)
  4. MicroTesting Solutions LLC, Hilliard, OH (United States)
  5. Johns Hopkins Univ., Laurel, MD (United States). Applied Physics Lab.

In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.

Research Organization:
Idaho National Laboratory (INL), Idaho Falls, ID (United States)
Sponsoring Organization:
USDOE; National Science Founation (NSF)
Grant/Contract Number:
AC07-05ID14517; 1841758; 2028923; 2103845; 2138811
OSTI ID:
1986538
Report Number(s):
INL/JOU-23-71771-Rev000; TRN: US2402761
Journal Information:
International Journal of High Performance Computing Applications, Vol. 37, Issue 3-4; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English

References (23)

A Roadmap to Robust Science for High-throughput Applications: The Scientists’ Perspective conference September 2021
Resource management for Infrastructure as a Service (IaaS) in cloud computing: A survey journal May 2014
Network Quality of Service in Docker Containers conference September 2015
A Two-Tiered Approach to I/O Quality of Service in Docker Containers conference September 2015
Dynamic CPU Resource Allocation in Containerized Cloud Environments conference September 2015
Pegasus, a workflow management system for science automation journal May 2015
On the Application of X-ray Microtomography in the Field of Materials Science journal August 2001
Workflow scheduling in cloud: a survey journal May 2015
FireWorks: a dynamic workflow system designed for high-throughput applications: FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications journal May 2015
SciServer: A science platform for astronomy and beyond journal October 2020
Building Trust in Earth Science Findings through Data Traceability and Results Explainability journal February 2023
Dask: Parallel Computation with Blocked algorithms and Task Scheduling conference January 2015
Evaluating Scientific Workflow Engines for Data and Compute Intensive Discoveries conference December 2019
Swift: A language for distributed parallel scripting journal September 2011
Generalizable coordination of large multiscale workflows: challenges and learnings at scale
  • Bhatia, Harsh; Di Natale, Francesco; Moon, Joseph Y.
  • SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476210
conference November 2021
A Roadmap to Robust Science for High-throughput Applications: The Developers’ Perspective conference September 2021
Big data provenance: Challenges, state of the art and opportunities conference October 2015
Cloud-Native Repositories for Big Scientific Data journal March 2021
Enhancing reproducibility for computational methods journal December 2016
BEAM: A Computational Workflow System for Managing and Modeling Material Characterization Data in HPC Environments journal January 2016
Implementation of a Continuous Integration and Deployment Pipeline for Containerized Applications in Amazon Web Services Using Jenkins, Ansible and Kubernetes conference December 2020
Sol–Gel‐Based Advanced Porous Silica Materials for Biomedical Applications journal April 2020
Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study conference June 2017

Similar Records

Conquering Data Chaos: Research Data Management with Kubernetes
Conference · Fri Oct 20 00:00:00 EDT 2023 · OSTI ID:1986538

Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
Journal Article · Fri May 22 00:00:00 EDT 2015 · Journal of Physics. Conference Series · OSTI ID:1986538

SDN for End-to-end Networked Science at the Exascale (SENSE) - Final Technical Report
Technical Report · Mon Dec 02 00:00:00 EST 2019 · OSTI ID:1986538