skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data Jockey: Automatic Data Management for HPC Multi-Tiered Storage Systems

Conference ·

We present the design and implementation of Data Jockey, a data management system for HPC multi-tiered storage systems. As a centralized data management control plane, Data Jockey automates bulk data movement and placement for scientific workflows and integrates into existing HPC storage infrastructures. Data Jockey simplifies data management by eliminating human effort in programming complex data movements, laying datasets across multiple storage tiers when supporting complex workflows, which in turn increases the usability of multitiered storage systems emerging in modern HPC data centers.Specifically, Data Jockey presents a new data management scheme called “goal driven data management” that can automatically infer low-level bulk data movement plans from declarative high-level goal statements that come from the lifetime of iterative runs of scientific workflows. While doing so, Data Jockey aims to minimize data wait times by taking responsibility for datasets that are unused or to be used, and aggressively utilizing the capacity of the upper, higher performant storage tiers.We evaluated a prototype implementation of Data Jockey under a synthetic workload based on a year’s worth of Oak Ridge Leadership Computing Facility’s (OLCF) operational logs. Our evaluations suggest that Data Jockey leads to higher utilization of the upper storage tiers while minimizing the programming effort of data movement compared to human involved, per-domain adhoc data management scripts.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1558554
Resource Relation:
Conference: 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2019) - Rio de Janeiro, , Brazil - 5/20/2019 8:00:00 AM-5/24/2019 8:00:00 AM
Country of Publication:
United States
Language:
English

Similar Records

Challenges and Opportunities of User-Level File Systemsfor HPC
Technical Report · Wed Aug 23 00:00:00 EDT 2017 · OSTI ID:1558554

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Journal Article · Sun Jan 01 00:00:00 EST 2017 · OSTI ID:1558554

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Journal Article · Mon Oct 01 00:00:00 EDT 2018 · Journal of Open Source Software · OSTI ID:1558554

Related Subjects