skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Compressing unstructured mesh data from simulations using machine learning

Abstract

The amount of data output from a computer simulation has grown to terabytes and petabytes as increasingly complex simulations are being run on massively parallel systems. As we approach exaflop computing in the next decade, it is expected that the I/O subsystem will not be able to write out these large volumes of data. In this paper, we explore the use of machine learning to compress the data before it is written out. Despite the computational constraints that limit us to using very simple learning algorithms, our results show that machine learning is a viable option for compressing unstructured data. Furthermore, we demonstrate that by simply using a better sampling algorithm to generate the training set, we can obtain more accurate results compared to random sampling, but at no extra cost. Further, by carefully selecting and incorporating points with high prediction error, we can improve reconstruction accuracy without sacrificing the compression rate.

Authors:
ORCiD logo [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1738887
Report Number(s):
LLNL-JRNL-750460
Journal ID: ISSN 2364-415X; 935302
Grant/Contract Number:  
AC52-07NA27344
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International journal of data science and analytics
Additional Journal Information:
Journal Volume: 9; Journal Issue: 1; Journal ID: ISSN 2364-415X
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Regression; compression; computer simulations; mesh data

Citation Formats

Kamath, Chandrika. Compressing unstructured mesh data from simulations using machine learning. United States: N. p., 2019. Web. doi:10.1007/s41060-019-00180-6.
Kamath, Chandrika. Compressing unstructured mesh data from simulations using machine learning. United States. https://doi.org/10.1007/s41060-019-00180-6
Kamath, Chandrika. 2019. "Compressing unstructured mesh data from simulations using machine learning". United States. https://doi.org/10.1007/s41060-019-00180-6. https://www.osti.gov/servlets/purl/1738887.
@article{osti_1738887,
title = {Compressing unstructured mesh data from simulations using machine learning},
author = {Kamath, Chandrika},
abstractNote = {The amount of data output from a computer simulation has grown to terabytes and petabytes as increasingly complex simulations are being run on massively parallel systems. As we approach exaflop computing in the next decade, it is expected that the I/O subsystem will not be able to write out these large volumes of data. In this paper, we explore the use of machine learning to compress the data before it is written out. Despite the computational constraints that limit us to using very simple learning algorithms, our results show that machine learning is a viable option for compressing unstructured data. Furthermore, we demonstrate that by simply using a better sampling algorithm to generate the training set, we can obtain more accurate results compared to random sampling, but at no extra cost. Further, by carefully selecting and incorporating points with high prediction error, we can improve reconstruction accuracy without sacrificing the compression rate.},
doi = {10.1007/s41060-019-00180-6},
url = {https://www.osti.gov/biblio/1738887}, journal = {International journal of data science and analytics},
issn = {2364-415X},
number = 1,
volume = 9,
place = {United States},
year = {Mon Apr 01 00:00:00 EDT 2019},
month = {Mon Apr 01 00:00:00 EDT 2019}
}

Works referenced in this record:

Fast Error-Bounded Lossy HPC Data Compression with SZ
conference, May 2016


Fixed-Rate Compressed Floating-Point Arrays
journal, December 2014


Spectrally optimal sampling for distribution ray tracing
conference, January 1991


Fast and Efficient Compression of Floating-Point Data
journal, September 2006


Learning to compress images and videos
conference, January 2007


ISABELA for effective in situ compression of scientific data: ISABELA FOR EFFECTIVE
journal, July 2012

  • Lakshminarasimhan, Sriram; Shah, Neil; Ethier, Stephane
  • Concurrency and Computation: Practice and Experience, Vol. 25, Issue 4
  • https://doi.org/10.1002/cpe.2887

Learning to Compress Unstructured Mesh Data from Simulations
conference, October 2017


Spectrally optimal sampling for distribution ray tracing
journal, July 1991


A Comparison of Compressed Sensing and Sparse Recovery Algorithms Applied to Simulation Data
journal, August 2016


NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing
conference, November 2014

  • Chen, Zhengzhang; Son, Seung Woo; Hendrix, William
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2014.65

Fast Poisson disk sampling in arbitrary dimensions
conference, January 2007


Wavelet-based data compression for flow simulation on block-structured Cartesian mesh: DATA COMPRESSION FOR FLOW SIMULATION ON CARTESIAN MESH
journal, May 2013