Adaptive Generation of Training Data for ML Reduced Model Creation
- ORNL
Machine learning proxy models are often used to speed up or completely replace complex computational models. The greatly reduced and deterministic computational costs enable new use cases such as digital twin control systems and global optimization. The challenge of building these proxy models is generating the training data. A naive uniform sampling of the input space can result in a non-uniform sampling of the output space of a model. This can cause gaps in the training data coverage that can miss finer scale details resulting in poor accuracy. While larger and larger data sets could eventually fill in these gaps, the computational burden of full-scale simulation codes can make this prohibitive. In this paper, we present an adaptive data generation method that utilizes uncertainty estimation to identify regions where training data should be augmented. By targeting data generation to areas of need, representative data sets can be generated efficiently. The effectiveness of this method will be demonstrated on a simple one-dimensional function and a complex multidimensional physics model.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1923172
- Resource Relation:
- Conference: The 4th International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2022 - Virtual, Tennessee, United States of America - 12/17/2022 10:00:00 AM-12/20/2022 10:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Efficient data acquisition and training of collisional-radiative model artificial neural network surrogates through adaptive parameter space sampling
Simplified predictive models for CO2 sequestration performance assessment