Abstract
This contribution introduces an approach on ETL attached Data Quality Management by means of an autonomous Data Quality Monitoring System. The Data Quality Monitor can be attached (via light-weight connectors) to already implemented ETL processes and allows to quantify data quality and to suggest measures if the quality of a particular data package falls below a certain limit for instance. Furthermore, the long-term vision of this approach is to correct corrupted data (semi-)automatically according to user-defined Data Quality Rules. The Data Quality Monitor can be attached to an ETL process by defining ”snapshot points”, where data samples which should be validated are collected and by introducing ”approval points”, where an ETL process can be interrupted in case of corrupted input data. As the Data Quality Monitor is an autonomous module which is attached to instead of embedded into ETL processes, this approach supports the division of work between ETL developers and special data quality engineers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Stumptner, R., Freudenthaler, B., Krenn, M.: BIAccelerator – A template-based approach for rapid ETL development. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds.) ISMIS 2012. LNCS, vol. 7661, pp. 435–444. Springer, Heidelberg (2012)
Lettner, C., Zwick, M.: A data analysis framework for high-variety product lines in the industrial manufacturing domain. To appear in Proceedings of the 16th International Conference on Enterprise Information Systems, Lisbon, Portugal (2014)
Bertossi, L., Bravo, L.: Generic and declarative approaches to data quality management. In: Handbook of Data Quality, pp. 181–211. Springer (2013)
Fan, W., Geerts, F., Jia, X.: Semandaq: A data quality system based on conditional functional dependencies. Proc. VLDB Endow. 1(2), 1460–1463 (2008)
Rodic, J., Baranovic, M.: Generating data quality rules and integration into etl process. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, DOLAP 2009, pp. 65–72. ACM, New York (2009)
Microsoft: Data quality services, sql server 2012 books online, http://msdn.microsoft.com/en-us/library/ff877925.aspx (online; accessed January 27, 2014)
Farinha, J., Trigueiros, M.J., Belo, O.: Using inheritance in a metadata based approach to data quality assessment. In: Proceedings of the First International Workshop on Model Driven Service Engineering and Data Quality and Security. MoSE+DQS 2009, pp. 1–8. ACM, New York (2009)
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F., Ouzzani, M., Tang, N.: Nadeef: A commodity data cleaning system. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 541–552. ACM, New York (2013)
Kettle, http://community.pentaho.com/projects/data-integration/ (online; accessed February 05, 2014)
PostgreSQL, http://www.postgresql.org/ (online; accessed February 05, 2014)
Celko, J.: Joe Celko’s SQL for Smarties: Advanced SQL Programming. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lettner, C., Stumptner, R., Bokesch, KH. (2014). An Approach on ETL Attached Data Quality Management. In: Bellatreche, L., Mohania, M.K. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2014. Lecture Notes in Computer Science, vol 8646. Springer, Cham. https://doi.org/10.1007/978-3-319-10160-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-10160-6_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10159-0
Online ISBN: 978-3-319-10160-6
eBook Packages: Computer ScienceComputer Science (R0)