Abstract
Nowadays business decisions heavily rely on data in data warehouse systems (DWH), thus data quality (DQ) in DWH is a highly relevant topic. Consequently, sophisticated yet still easy to use solutions for monitoring and ensuring high data quality are needed. This paper is based on the IQM4HD project in which a prototype of an automated data quality monitoring system has been designed and implemented. Specifically, we focus on the aspect of expressing advanced data quality rules such as checking whether data conforms to a certain time series or whether data deviates significantly in any of the dimensions within a data cube. We show how such types of data quality rules can be expressed in our domain specific language (DSL) RADAR which has been introduced in [10]. Since manual specification of such rules tends to be complex, it is particularly important to support the DQ manager in detecting and creating potential rules by profiling of historic data. Thus we also explain the data profiling component of our prototype and illustrate how advanced rules can be semi-automatically detected and suggested to the DQ manager.
The project IQM4HD has been funded by the German Federal Ministry of Education and Research under grant no. 01IS15053A. We would also like to thank our partners SHS Viveon/mVise for implementing the prototype and CTS Eventim for providing important requirements and reviewing practical applicability of the prototype and concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015). https://doi.org/10.1007/s00778-015-0389-y
Aggarwal, C.C.: Outlier Analysis, 1st edn. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6396-2
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 746–755. IEEE (2007). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4221723
Caruccio, L., Deufemia, V., Polese, G.: Relaxed functional dependencies - a survey of approaches. IEEE Trans. Knowl. Data Eng. 28(1), 147–165 (2016)
Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endow. 1(1), 1166–1177 (2008). http://dl.acm.org/citation.cfm?id=1453980
Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014). https://doi.org/10.1109/TKDE.2013.184
Heine, F., Kleiner, C., Koschel, A., Westermayer, J.: The data checking engine: complex rules for data quality monitoring (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.682.7950&rep=rep1&type=pdf
Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts, Melbourne (2018)
Li, X., Han, J.: Mining approximate top-k subspace anomalies in multi-dimensional time-series data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 447–458. VLDB Endowment (2007)
Oelsner, T., Heine, F., Kleiner, C.: IQM4HD concepts. Technical report, University of Applied Sciences and Arts Hannover, Germany (2018). http://iqm4hd.wp.hs-hannover.de/ConceptsIQM4HD.pdf
Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, San Francisco (2003)
Schelter, S., Lange, D., Schmidt, P., Celikel, M., Biessmann, F., Grafberger, A.: Automating large-scale data quality verification. Proc. VLDB Endow. 11(12), 1781–1794 (2018). https://doi.org/10.14778/3229863.3229867. http://www.vldb.org/pvldb/vol11/p1781-schelter.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Heine, F., Kleiner, C., Oelsner, T. (2019). Automated Detection and Monitoring of Advanced Data Quality Rules. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)