skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A View from ORNL: Scientific Data Research Opportunities in the Big Data Age

Conference ·
ORCiD logo [1]; ORCiD logo [1];  [1];  [2]; ORCiD logo [1];  [3];  [2]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1];  [1];  [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1];  [4]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1] more »; ORCiD logo [1]; ORCiD logo [1] « less
  1. ORNL
  2. Kitware
  3. Georgia Institute of Technology, Atlanta
  4. Rutgers University

One of the core issues across computer and computational science today is adapting to, managing, and learning from the influx of "Big Data". In the commercial space, this problem has led to a huge investment in new technologies and capabilities that are well adapted to dealing with the sorts of human-generated logs, videos, texts, and other large-data artifacts that are processed and resulted in an explosion of useful platforms and languages (Hadoop, Spark, Pandas, etc.). However, translating this work from the enterprise space to the computational science and HPC community has proven somewhat difficult, in part because of some of the fundamental differences in type and scale of data and timescales surrounding its generation and use. We describe a forward-looking research and development plan which centers around the concept of making Input/Output (I/O) intelligent for users in the scientific community, whether they are accessing scalable storage or performing in situ workflow tasks. Much of our work is based on our experience with the Adaptable I/O System (ADIOS 1.X), and our next generation version of the software ADIOS 2.X [1].

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1468120
Resource Relation:
Conference: IEEE 38th International Conference on Distributed Computing Systems (ICDCS) - Vienna, , Austria - 7/2/2018 4:00:00 AM-7/5/2018 4:00:00 AM
Country of Publication:
United States
Language:
English

References (39)

Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems journal February 2017
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution
  • Zheng, Fang; Yu, Hongfeng; Hantas, Can
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503279
conference January 2013
Topology Mapping for Blue Gene/L Supercomputer conference November 2006
The global version of the gyrokinetic turbulence code GENE journal August 2011
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures journal May 2016
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems journal January 2005
Visualization and Analysis Requirements for In Situ Processing for a Large-Scale Fusion Simulation Code
  • Kress, James; Pugmire, David; Klasky, Scott
  • 2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV) https://doi.org/10.1109/ISAV.2016.014
conference November 2016
Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS journal August 2013
Handling Failures in Parallel Scientific Workflows Using Clouds
  • Costa, Flavio; de Oliveira, Daniel; Ocana, Kary
  • 2012 SC Companion: High-Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.28
conference November 2012
Extending Skel to Support the Development and Optimization of Next Generation I/O Systems conference September 2017
Gyrokinetic neoclassical study of the bootstrap current in the tokamak edge pedestal with fully non-linear Coulomb collisions journal April 2016
SODA: Science-Driven Orchestration of Data Analytics conference August 2015
Event-based systems: opportunities and challenges at exascale conference January 2009
Service Augmentation for High End Interactive Data Services conference September 2005
Landrush: Rethinking In-Situ Analysis for GPGPU Workflows conference May 2016
DataSpaces: an interaction and coordination framework for coupled simulation workflows conference January 2010
Global and local gyrokinetic simulations of high-performance discharges in view of ITER journal May 2013
Big data provenance: Challenges, state of the art and opportunities conference October 2015
Global adjoint tomography: first-generation model journal September 2016
Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System conference June 2017
Electron Temperature Gradient Turbulence journal December 2000
Exacution: Enhancing Scientific Data Management for Exascale conference June 2017
Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces
  • Docan, Ciprian; Parashar, Manish; Cummings, Julian
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.120
conference May 2011
TGE: Machine Learning Based Task Graph Embedding for Large-Scale Topology Mapping conference September 2017
Topology-aware task mapping for reducing communication contention on large parallel machines conference January 2006
Compressed ion temperature gradient turbulence in diverted tokamak edge journal May 2009
Machine Learning Predictions of Runtime and IO Traffic on High-End Clusters conference September 2016
Meteor: a middleware infrastructure for content‐based decoupled interactions in pervasive grid environments journal November 2007
GPUShare: Fair-Sharing Middleware for GPU Clouds conference May 2016
I/O performance challenges at leadership scale conference January 2009
Scientific workflow management and the Kepler system
  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10 https://doi.org/10.1002/cpe.994
journal January 2006
Generic topology mapping strategies for large-scale parallel architectures conference January 2011
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data journal August 2014
Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales conference December 2017
Performance Modeling of In Situ Rendering
  • Larsen, Matthew; Harrison, Cyrus; Kress, James
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.23
conference November 2016
24/7 Characterization of petascale I/O workloads conference August 2009
A Multiplatform Study of I/O Behavior on Petascale Supercomputers
  • Luu, Huong; Winslett, Marianne; Gropp, William
  • Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15 https://doi.org/10.1145/2749246.2749269
conference January 2015
SSD-optimized workload placement with adaptive learning and classification in HPC environments conference June 2014
Exascale Storage Systems the SIRIUS Way journal October 2016

Similar Records

Extending the Publish/Subscribe Abstraction for High-Performance I/O and Data Management at Extreme Scale
Journal Article · Sun Mar 01 00:00:00 EST 2020 · Bulletin of the IEEE Technical Committee on Data Engineering · OSTI ID:1468120

Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
Journal Article · Fri May 22 00:00:00 EDT 2015 · Journal of Physics. Conference Series · OSTI ID:1468120

Computing for Finance
Multimedia · Wed Mar 24 00:00:00 EDT 2010 · OSTI ID:1468120

Related Subjects