Abstract:
Data quality issues can lead to significant financial losses, which necessitates the use of data profiling to monitor the statistical properties of datasets and detect ha...Show MoreMetadata
Abstract:
Data quality issues can lead to significant financial losses, which necessitates the use of data profiling to monitor the statistical properties of datasets and detect harmful deviations early. However, many existing data profiling solutions cannot perform multi-column feature analysis. On the other hand, machine learning algorithms, particularly decision trees, are effective at discovering non-linear and multivariate patterns within tabular data. In this study, we propose a framework that combines the interpretable pattern-mining capabilities of decision trees with time series forecasting to identify significant changes in data. We evaluate our framework on a real-world dataset from a leading telecommunications provider in Germany, which includes a known anomaly resulting from a faulty database entry. Our results indicate that our framework successfully detects data changes and provides interpretable descriptions for each anomaly, highlighting its relevancy for practitioners.
Published in: 2023 IEEE 25th Conference on Business Informatics (CBI)
Date of Conference: 21-23 June 2023
Date Added to IEEE Xplore: 25 July 2023
ISBN Information: