Provenance–aware workflow for data quality management and improvement for large continuous scientific data streams
- ORNL
Data quality assessment, management and improvement is integral part of any big data intensive scientific research to ensure accurate, reliable and reproducible science discoveries. The task of maintaining the quality of data, however, is non-trivial and pose a challenge for a program like Department of Energy’s Atmospheric Radiation Measurement (ARM) that collects data from hundreds of instruments across the world,and distributes thousands of streaming data products that are continuously growing in near-real-time in an archive 1.7 Petabyte in size and growing. We present a computational data processing workflow to collect the data quality issue via an easy and intuitive web-based portal that allows reporting of any quality issues for any site, facility or instruments at a granularity down to individual variable in the data files. Portal allows instrument specialists and scientists to provide corrective actions in form of symbolic equation. A parallel processing framework applies the data improvement to large volume of data in efficient parallel environment, while optimizing data transfer and file I/O operations. Corrected files are systematically versioned and archived. A provenance tracking module tracks and records any change made to the data during its entire life cycle which are communicated transparently to the scientific users. Developed in Python using open source technologies, software architecture enables efficient and fast management and improvement of data in an operational data center environment.
- Research Organization:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1761685
- Resource Relation:
- Conference: - , , - -
- Country of Publication:
- United States
- Language:
- English
Similar Records
The SNS/HFIR Web Portal System for SANS
The SNS/HFIR Web Portal System – How Can it Help Me?