skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Adaptive Generation of Training Data for ML Reduced Model Creation

Conference ·

Machine learning proxy models are often used to speed up or completely replace complex computational models. The greatly reduced and deterministic computational costs enable new use cases such as digital twin control systems and global optimization. The challenge of building these proxy models is generating the training data. A naive uniform sampling of the input space can result in a non-uniform sampling of the output space of a model. This can cause gaps in the training data coverage that can miss finer scale details resulting in poor accuracy. While larger and larger data sets could eventually fill in these gaps, the computational burden of full-scale simulation codes can make this prohibitive. In this paper, we present an adaptive data generation method that utilizes uncertainty estimation to identify regions where training data should be augmented. By targeting data generation to areas of need, representative data sets can be generated efficiently. The effectiveness of this method will be demonstrated on a simple one-dimensional function and a complex multidimensional physics model.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1923172
Resource Relation:
Conference: The 4th International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2022 - Virtual, Tennessee, United States of America - 12/17/2022 10:00:00 AM-12/20/2022 10:00:00 AM
Country of Publication:
United States
Language:
English

References (21)

Uncertainty Analysis in 3D Equilibrium Reconstruction journal February 2018
A sequential algorithm for training text classifiers journal September 1995
Bayesian approach for neural networks—review and case studies journal April 2001
Containers for Massive Ensemble of I/O Bound Hierarchical Coupled Simulations conference November 2020
An analysis of active learning strategies for sequence labeling tasks conference January 2008
Active Learning with Statistical Models journal January 1996
Proof of concept of a fast surrogate model of the VMEC code via neural networks in Wendelstein 7-X scenarios journal August 2021
Neural network molecular dynamics simulations of solid–liquid interfaces: water at low-index copper surfaces journal January 2016
Bayesian Neural Networks: An Introduction and Survey book January 2020
Bayesian Methods for Neural Networks and Related Models journal February 2004
Responsible AI for conservation journal February 2019
How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework conference June 2020
MHD equilibrium reconstruction in the presence of correlated data journal January 2006
Deep Learning and Ensemble Methods for Domain Adaptation conference November 2016
Nested Workflows for Loosely Coupled HPC Simulations conference November 2019
The Design and Implementation of the SWIM Integrated Plasma Simulator
  • Elwasif, Wael R.; Bernholdt, David E.; Shet, Aniruddha G.
  • 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2010.63
conference February 2010
Integrated modeling of high β N steady state scenario on DIII-D journal January 2018
Query by committee conference January 1992
Machine learning–accelerated computational fluid dynamics journal May 2021
Self-consistent core-pedestal transport simulations with neural network accelerated models journal July 2017
Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network journal February 2018

Similar Records

Related Subjects