skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Canopus: A Paradigm Shift Towards Elastic Extreme-Scale Data Analytics on HPC

Conference ·

Scientific simulations on high performance computing (HPC) platforms generate large quantities of data. To bridge the widening gap between compute and I/O, and enable data to be more efficiently stored and analyzed, simulation outputs need to be refactored, reduced, and appropriately mapped to storage tiers. However, a systematic solution to support these steps has been lacking on the current HPC software ecosystem. To that end, this paper develops Canopus, a progressive JPEGlike data management scheme for storing and analyzing big scientific data. It co-designs the data decimation, compression and data storage, taking the hardware characteristics of each storage tier into considerations. With reasonably low overhead, our approach refactors simulation data into a much smaller, reduced-accuracy base dataset, and a series of deltas that is used to augment the accuracy if needed. The base dataset and deltas are compressed and written to multiple storage tiers. Data saved on different tiers can then be selectively retrieved to restore the level of accuracy that satisfies data analytics. Thus, Canopus provides a paradigm shift towards elastic data analytics and enables end users to make trade-offs between analysis speed and accuracy on-the-fly. We evaluate the impact of Canopus on unstructured triangular meshes, a pervasive data model used by scientific modeling and simulations. In particular, we demonstrate the progressive data exploration of Canopus using the “blob detection” use case on the fusion simulation data.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1468029
Resource Relation:
Conference: 2017 IEEE International Conference on Cluster Computing (CLUSTER) - Honolulu, Hawaii, United States of America - 9/5/2017 8:00:00 AM-9/8/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

References (35)

Local adaptive mesh refinement for shock hydrodynamics journal May 1989
Adaptive Performance-Constrained In Situ Visualization of Atmospheric Simulations conference September 2016
The Top 10 Challenges in Extreme-Scale Visual Analytics journal July 2012
FPC: A High-Speed Compressor for Double-Precision Floating-Point Data journal January 2009
Assessing the Effects of Data Compression in Simulations Using Physically Motivated Metrics journal January 2014
DRepl: Optimizing access to application data for analysis and visualization conference May 2013
Surface simplification using quadric error metrics conference January 1997
Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS journal August 2013
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
  • Zhang, Fan; Docan, Ciprian; Parashar, Manish
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.122
conference May 2012
The JPEG2000 still image coding system: an overview journal January 2000
Generation of Magnetic Fields by the Stationary Accretion Shock Instability journal April 2010
MLOC: Multi-level Layout Optimization Framework for Compressed Scientific Data Exploration with Heterogeneous Access Patterns conference September 2012
Gyrokinetic neoclassical study of the bootstrap current in the tokamak edge pedestal with fully non-linear Coulomb collisions journal April 2016
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
  • Bennett, Janine C.; Abbasi, Hasan; Bremer, Peer-Timo
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.31
conference November 2012
Geometry compression conference January 1995
Efficient I/O and Storage of Adaptive-Resolution Data
  • Kumar, Sidharth; Edwards, John; Bremer, Peer-Timo
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.39
conference November 2014
In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows journal August 2014
Expediting scientific data analysis with reorganization of data conference September 2013
An Adaptive Prediction-Based Approach to Lossless Compression of Floating-Point Volume Data journal December 2012
Fast Error-Bounded Lossy HPC Data Compression with SZ conference May 2016
A Fokker-Planck-Landau collision equation solver on two-dimensional velocity grid and its application to particle-in-cell simulation journal March 2014
Experiments with in-transit processing for data intensive grid workflows conference September 2007
PreDatA – preparatory data analytics on peta-scale machines conference April 2010
Fixed-Rate Compressed Floating-Point Arrays journal December 2014
Exacution: Enhancing Scientific Data Management for Exascale conference June 2017
Compressed ion temperature gradient turbulence in diverted tokamak edge journal May 2009
Fast and Efficient Compression of Floating-Point Data journal September 2006
Unsupervised learning applied to progressive compression of time-dependent geometry journal June 2005
Decimation of triangle meshes
  • Schroeder, William J.; Zarge, Jonathan A.; Lorensen, William E.
  • Proceedings of the 19th annual conference on Computer graphics and interactive techniques - SIGGRAPH '92 https://doi.org/10.1145/133994.134010
conference January 1992
New quadric metric for simplifying meshes with appearance attributes conference January 1999
ParaView Catalyst: Enabling In Situ Data Analysis and Visualization
  • Ayachit, Utkarsh; Bauer, Andrew; Geveci, Berk
  • Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV2015 https://doi.org/10.1145/2828612.2828624
conference January 2015
Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry journal September 2009
An Ephemeral Burst-Buffer File System for Scientific Applications
  • Wang, Teng; Mohror, Kathryn; Moody, Adam
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.68
conference November 2016
Convective transport by intermittent blob-filaments: Comparison of theory and experiment journal June 2011
PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Patterns conference May 2013

Similar Records

SIRIUS: Enabling Progressive Data Exploration for Extreme-Scale Scientific Data
Journal Article · Fri Dec 14 00:00:00 EST 2018 · IEEE Transactions on Multi-Scale Computing Systems · OSTI ID:1468029

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Conference · Tue May 01 00:00:00 EDT 2018 · OSTI ID:1468029

Could Blobs Fuel Storage-Based Convergence between HPC and Big Data?
Conference · Tue Sep 05 00:00:00 EDT 2017 · OSTI ID:1468029

Related Subjects