Logo des Repositoriums
 
Textdokument

Approaches for Automated Data Quality Analysis: Syntactic and Semantic Assessment

Vorschaubild nicht verfügbar

Volltext URI

Dokumententyp

Zusatzinformation

Datum

2022

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

Data quality significantly influences data usability and plays an important role in data trading. This paper presents a data quality analysis (DQA) of data tables on two levels. The first, the so-called syntactic level, concerns the structure of the elements within the database and the second, the so-called semantic level, concerns the relationship between the elements in the database and the "real world". Based on a literature review the most relevant data quality criteria and corresponding metrics were derived. Subsequently, based on heuristics, a data-centric approach and an unsupervised machine learning clustering algorithm DBSCAN, a service for automated DQA, is designed and implemented (syntactic DQA). In the next step, an automated semantic DQA service as well. The approach is used to examine data tables for example for missing relevant columns (i.e., semantic completeness). A data quality index represents the services’ output, which is derived from the automated analysis of various data quality criteria. This enables the assessment of data quality, as well as the detection of potentials for improving quality and thus increasing the value of tradeable data.

Beschreibung

Ahiagble,Agbodzea Pascal; Stein,Hannah (2022): Approaches for Automated Data Quality Analysis: Syntactic and Semantic Assessment. INFORMATIK 2022. DOI: 10.18420/inf2022_85. Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-720-3. pp. 1023-1036. Datenqualität und Qualitätsmetriken in der Datenwirtschaft (DQ). Hamburg. 26.-30. September 2022

Zitierform

Tags