Adaptive Generation of Training Data for ML Reduced Model Creation

Cianciosa, Mark; Archibald, Richard; Elwasif, Wael; Gainaru, Ana; Park, Jin Myung; Whitfield, Ross

doi:10.1109/BigData55660.2022.10020884

Title: Adaptive Generation of Training Data for ML Reduced Model Creation

Conference · Thu Dec 01 00:00:00 EST 2022

DOI:https://doi.org/10.1109/BigData55660.2022.10020884· OSTI ID:1923172

^[1];

^[1]; Gainaru, Ana ^[1];

^[1];

^[1]

ORNL

Machine learning proxy models are often used to speed up or completely replace complex computational models. The greatly reduced and deterministic computational costs enable new use cases such as digital twin control systems and global optimization. The challenge of building these proxy models is generating the training data. A naive uniform sampling of the input space can result in a non-uniform sampling of the output space of a model. This can cause gaps in the training data coverage that can miss finer scale details resulting in poor accuracy. While larger and larger data sets could eventually fill in these gaps, the computational burden of full-scale simulation codes can make this prohibitive. In this paper, we present an adaptive data generation method that utilizes uncertainty estimation to identify regions where training data should be augmented. By targeting data generation to areas of need, representative data sets can be generated efficiently. The effectiveness of this method will be demonstrated on a simple one-dimensional function and a complex multidimensional physics model.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1923172

Resource Relation:: Conference: The 4th International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2022 - Virtual, Tennessee, United States of America - 12/17/2022 10:00:00 AM-12/20/2022 10:00:00 AM

Country of Publication:: United States

Language:: English

References (21)

Uncertainty Analysis in 3D Equilibrium Reconstruction Cianciosa, Mark R.; Hanson, James D.; Maurer, David A. Fusion Science and Technology, Vol. 74, Issue 1-2 https://doi.org/10.1080/15361055.2017.1392819	journal	February 2018
A sequential algorithm for training text classifiers Lewis, David D. ACM SIGIR Forum, Vol. 29, Issue 2 https://doi.org/10.1145/219587.219592	journal	September 1995
Bayesian approach for neural networks—review and case studies Lampinen, Jouko; Vehtari, Aki Neural Networks, Vol. 14, Issue 3 https://doi.org/10.1016/S0893-6080(00)00098-8	journal	April 2001
Containers for Massive Ensemble of I/O Bound Hierarchical Coupled Simulations Elwasif, Wael; Whitfield, Ross; Park, Jin Myung 2020 2nd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC) https://doi.org/10.1109/CANOPIEHPC51917.2020.00009	conference	November 2020
An analysis of active learning strategies for sequence labeling tasks Settles, Burr; Craven, Mark Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08 https://doi.org/10.3115/1613715.1613855	conference	January 2008
Active Learning with Statistical Models Cohn, D. A.; Ghahramani, Z.; Jordan, M. I. Journal of Artificial Intelligence Research, Vol. 4 https://doi.org/10.1613/jair.295	journal	January 1996
Proof of concept of a fast surrogate model of the VMEC code via neural networks in Wendelstein 7-X scenarios Merlo, Andrea; Böckenhoff, Daniel; Schilling, Jonathan Nuclear Fusion, Vol. 61, Issue 9 https://doi.org/10.1088/1741-4326/ac1a0d	journal	August 2021
Neural network molecular dynamics simulations of solid–liquid interfaces: water at low-index copper surfaces Natarajan, Suresh Kondati; Behler, Jörg Physical Chemistry Chemical Physics, Vol. 18, Issue 41 https://doi.org/10.1039/C6CP05711J	journal	January 2016
Bayesian Neural Networks: An Introduction and Survey Goan, Ethan; Fookes, Clinton Case Studies in Applied Bayesian Data Science https://doi.org/10.1007/978-3-030-42553-1_3	book	January 2020
Bayesian Methods for Neural Networks and Related Models Titterington, D. M. Statistical Science, Vol. 19, Issue 1 https://doi.org/10.1214/088342304000000099	journal	February 2004
Responsible AI for conservation Wearn, Oliver R.; Freeman, Robin; Jacoby, David M. P. Nature Machine Intelligence, Vol. 1, Issue 2 https://doi.org/10.1038/s42256-019-0022-7	journal	February 2019
How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework Liu, Xuanqing; Xiao, Tesi; Si, Si 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR42600.2020.00036	conference	June 2020
MHD equilibrium reconstruction in the presence of correlated data Jones, Christopher S.; Finn, John M. Nuclear Fusion, Vol. 46, Issue 2 https://doi.org/10.1088/0029-5515/46/2/017	journal	January 2006
Deep Learning and Ensemble Methods for Domain Adaptation Nozza, Debora; Fersini, Elisabetta; Messina, Enza 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) https://doi.org/10.1109/ICTAI.2016.0037	conference	November 2016
Nested Workflows for Loosely Coupled HPC Simulations Elwasif, Wael R.; Lasa, Ane; Roth, Philip C. 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA) https://doi.org/10.1109/AICCSA47632.2019.9035247	conference	November 2019
The Design and Implementation of the SWIM Integrated Plasma Simulator Elwasif, Wael R.; Bernholdt, David E.; Shet, Aniruddha G. 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing https://doi.org/10.1109/PDP.2010.63	conference	February 2010
Integrated modeling of high β _N steady state scenario on DIII-D Park, J. M.; Ferron, J. R.; Holcomb, C. T. Physics of Plasmas, Vol. 25, Issue 1 https://doi.org/10.1063/1.5013021	journal	January 2018
Query by committee Seung, H. S.; Opper, M.; Sompolinsky, H. Proceedings of the fifth annual workshop on Computational learning theory - COLT '92 https://doi.org/10.1145/130385.130417	conference	January 1992
Machine learning–accelerated computational fluid dynamics Kochkov, Dmitrii; Smith, Jamie A.; Alieva, Ayya Proceedings of the National Academy of Sciences, Vol. 118, Issue 21 https://doi.org/10.1073/pnas.2101784118	journal	May 2021
Self-consistent core-pedestal transport simulations with neural network accelerated models Meneghini, O.; Smith, S. P.; Snyder, P. B. Nuclear Fusion, Vol. 57, Issue 8 https://doi.org/10.1088/1741-4326/aa7776	journal	July 2017
Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network Janet, Jon Paul; Chan, Lydia; Kulik, Heather J. The Journal of Physical Chemistry Letters, Vol. 9, Issue 5 https://doi.org/10.1021/acs.jpclett.8b00170	journal	February 2018

Similar Records

Randomized Algorithms for Scientific Computing (RASC)

Technical Report · Sat Jul 10 00:00:00 EDT 2021 · OSTI ID:1923172

Buluç, Aydin; Kolda, Tamara G.; Wild, Stefan M.; +16 more

Efficient data acquisition and training of collisional-radiative model artificial neural network surrogates through adaptive parameter space sampling

Journal Article · Mon Oct 10 00:00:00 EDT 2022 · Machine Learning: Science and Technology · OSTI ID:1923172

Garland, Nathan A.; Maulik, Romit; Tang, Qi; +2 more

Simplified predictive models for CO₂ sequestration performance assessment

Technical Report · Wed Sep 30 00:00:00 EDT 2015 · OSTI ID:1923172

Mishra, Srikanta; Ganesh, Priya; Schuetter, Jared; +3 more

Title: Adaptive Generation of Training Data for ML Reduced Model Creation

Citation Formats

References (21)

Similar Records

Related Subjects