Creating a Tools Ecosystem for Cross-Discipline Environmental Data Reuse
- ORNL
Reusing data is difficult even within well-defined science communities and only gets worse when combining data from multiple communities and disciplines. Through the lens of current work on constructing an environmental epidemiological data set from multiple disciplinary sources, we demonstrate the need for a new tool ecosystem to support heterogeneous Big Data science. Extending existing community standards for schemas and/or data formats through human auditing and wrangling of the data is not feasible at scale. This work therefore suggests new approaches for the multi-disciplinary communities to build a shared tool ecosystem for big data. We discuss both the larger context of data wrangling of epidemiological data sets for novel artificial intelligence algorithms and the specific lessons from working with these multi-disciplinary data sets. Adopting a more model-driven, automatable approach promises not only better efficiency but also removes key sources of human-generated errors and promotes reuse and reproducibility of science data.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1873827
- Resource Relation:
- Conference: IEEE International Conference on Big Data - Virtual (Formerly Orlando, FL), Florida, United States of America - 12/15/2021 10:00:00 AM-12/18/2021 10:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
ADIOS Visualization Schema: A First Step Towards Improving Interdisciplinary Collaboration in High Performance Computing
Towards a Software Development Framework for Interconnected Science Ecosystems