skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Enabling discovery data science through cross-facility workflows

Abstract

Experimental and observational instruments for scientific research (such as light sources, genome sequencers, accelerators, telescopes and electron microscopes) increasingly require High Performance Computing (HPC) scale capabilities for data analysis and workflow processing. Next-generation instruments are being deployed with higher resolutions and faster data capture rates, creating a big data crunch that cannot be handled by modest institutional computing resources. Often these big data analysis pipelines also require near real-time computing and have higher resilience requirements than the simulation and modeling workloads more traditionally seen at HPC centers. While some facilities have enabled workflows to run at a single HPC facility, there is a growing need to integrate capabilities across HPC facilities to enable cross-facility workflows, either to provide resilience to an experiment, increase analysis throughput capabilities, or to better match a workflow to a particular architecture. In this paper we describe the barriers to executing complex data analysis workflows across HPC facilities and propose an architectural design pattern for enabling scientific discovery using cross-facility workflows that includes orchestration services, application programming interfaces (APIs), data access and co-scheduling.

Authors:
 [1];  [1];  [1];  [2];  [2]; ORCiD logo [3]; ORCiD logo [3]; ORCiD logo [3];  [4]; ORCiD logo [3]
  1. Lawrence Berkeley National Laboratory (LBNL)
  2. National Energy Research Scientific Computing Center (NERSC), California
  3. ORNL
  4. Argonne National Laboratory
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1847515
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: The 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2021 - Orlando, Florida, United States of America - 12/15/2021 5:00:00 AM-12/18/2021 5:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Antypas, Katerina B., Bard, Deborah, Blaschke, Johannes P., Canon, Richard Shane, Enders, Bjoern, Shankar, Mallikarjun, Somnath, Suhas, Stansberry, Dale, Uram, Thomas, and Wilkinson, Sean. Enabling discovery data science through cross-facility workflows. United States: N. p., 2021. Web. doi:10.1109/BigData52589.2021.9671421.
Antypas, Katerina B., Bard, Deborah, Blaschke, Johannes P., Canon, Richard Shane, Enders, Bjoern, Shankar, Mallikarjun, Somnath, Suhas, Stansberry, Dale, Uram, Thomas, & Wilkinson, Sean. Enabling discovery data science through cross-facility workflows. United States. https://doi.org/10.1109/BigData52589.2021.9671421
Antypas, Katerina B., Bard, Deborah, Blaschke, Johannes P., Canon, Richard Shane, Enders, Bjoern, Shankar, Mallikarjun, Somnath, Suhas, Stansberry, Dale, Uram, Thomas, and Wilkinson, Sean. 2021. "Enabling discovery data science through cross-facility workflows". United States. https://doi.org/10.1109/BigData52589.2021.9671421. https://www.osti.gov/servlets/purl/1847515.
@article{osti_1847515,
title = {Enabling discovery data science through cross-facility workflows},
author = {Antypas, Katerina B. and Bard, Deborah and Blaschke, Johannes P. and Canon, Richard Shane and Enders, Bjoern and Shankar, Mallikarjun and Somnath, Suhas and Stansberry, Dale and Uram, Thomas and Wilkinson, Sean},
abstractNote = {Experimental and observational instruments for scientific research (such as light sources, genome sequencers, accelerators, telescopes and electron microscopes) increasingly require High Performance Computing (HPC) scale capabilities for data analysis and workflow processing. Next-generation instruments are being deployed with higher resolutions and faster data capture rates, creating a big data crunch that cannot be handled by modest institutional computing resources. Often these big data analysis pipelines also require near real-time computing and have higher resilience requirements than the simulation and modeling workloads more traditionally seen at HPC centers. While some facilities have enabled workflows to run at a single HPC facility, there is a growing need to integrate capabilities across HPC facilities to enable cross-facility workflows, either to provide resilience to an experiment, increase analysis throughput capabilities, or to better match a workflow to a particular architecture. In this paper we describe the barriers to executing complex data analysis workflows across HPC facilities and propose an architectural design pattern for enabling scientific discovery using cross-facility workflows that includes orchestration services, application programming interfaces (APIs), data access and co-scheduling.},
doi = {10.1109/BigData52589.2021.9671421},
url = {https://www.osti.gov/biblio/1847515}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Dec 01 00:00:00 EST 2021},
month = {Wed Dec 01 00:00:00 EST 2021}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: