Mining Data Quality Rules for Data Migrations: A Case Study on Material Master Data

Altendeitering, Marcel

doi:10.1007/978-3-030-89159-6_12

Marcel Altendeitering¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13036))

Included in the following conference series:

International Symposium on Leveraging Applications of Formal Methods

2073 Accesses
2 Citations

Abstract

Master data sets are an important asset for organizations and their quality must be high to ensure organizational success. At the same time, data migrations are complex projects and they often result in impaired data sets of lower quality. In particular, data quality issues that involve multiple attributes are difficult to identify and can only be resolved with manual data quality checks. In this paper, we are investigating a real-world migration of material master data. Our goal is to ensure data quality by mining the target data set for data quality rules. In a data migration, incoming data sets must comply with these rules to be migrated. For generating data quality rules, we used a SVM for rules at a schema level and Association Rule Learning for rules at the instance level. We found that both methods produce valuable rules and are suitable for ensuring quality in data migrations. As an ensemble, the two methods are adequate to manage common real-world data characteristics such as sparsity or mixed values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abedjan, Z., et al.: Detecting data errors: where are we and what needs to be done? Proc. VLDB Endowment 9(12), 993–1004 (2016)
Article Google Scholar
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)
Article Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Altendeitering, M., Fraunhofer, I., Guggenberger, T.: Designing data quality tools: findings from an action design research project at Boehringer Ingelheim, pp. 1–16 (2021)
Google Scholar
Barateiro, J., Galhardas, H.: A survey of data quality tools. Datenbank-Spektrum 14, 15–21 (2005)
Google Scholar
Borgelt, C.: An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 1–5 (2005)
Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Article Google Scholar
Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endowment 1(1), 1166–1177 (2008)
Article Google Scholar
Drumm, C., Schmitt, M., Do, H.H., Rahm, E.: QuickMig: automatic schema matching for data migration projects. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 107–116. Association for Computing Machinery (2007)
Google Scholar
Ehrlinger, L., Rusz, E., Wöß, W.: A survey of data quality measurement and monitoring tools. arXiv preprint arXiv:1907.08138 (2019)
Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23(5), 683–698 (2010)
Article Google Scholar
Hipp, J., Güntzer, U., Grimmer, U.: Data quality mining-making a virute of necessity. In: DMKD, p. 6 (2001)
Google Scholar
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Article MATH Google Scholar
Kaitoua, A., Rabl, T., Katsifodimos, A., Markl, V.: Muses: distributed data migration system for polystores. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1602–1605. IEEE (2019)
Google Scholar
Kruse, S., et al.: Fast approximate discovery of inclusion dependencies. In: Datenbanksysteme für Business, Technologie und Web (BTW 2017) (2017)
Google Scholar
Matthes, F., Schulz, C., Haller, K.: Testing quality assurance in data migration projects. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 438–447 (2011)
Google Scholar
Morris, J.: Practical data migration. BCS, The Chartered Institute (2012)
Google Scholar
Papenbrock, T., et al.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endowment 8(10), 1082–1093 (2015)
Article Google Scholar
Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Google Scholar
Sarmah, S.S.: Data migration. Sci. Technol. 8(1), 1–10 (2018)
Google Scholar
Shao, Y.H., Chen, W.J., Deng, N.Y.: Nonparallel hyperplane support vector machine for binary classification problems. Inf. Sci. 263, 22–35 (2014)
Article MathSciNet MATH Google Scholar
Shrivastava, S., Patel, D., Zhou, N., Iyengar, A., Bhamidipaty, A.: DQLearn: a toolkit for structured data quality learning. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 1644–1653. IEEE (2020)
Google Scholar
Wang, P., He, Y.: Uni-detect: a unified approach to automated error detection in tables. In: Proceedings of the 2019 International Conference on Management of Data, pp. 811–828 (2019)
Google Scholar
Zou, J., Liu, X., Sun, H., Zeng, J.: Live instance migration with data consistency in composite service evolution. In: 2010 6th World Congress on Services, pp. 653–656. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer ISST, Emil-Figge-Straße 91, 44227, Dortmund, Germany
Marcel Altendeitering

Authors

Marcel Altendeitering
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcel Altendeitering .

Editor information

Editors and Affiliations

University of Limerick, Limerick, Ireland
Tiziana Margaria
TU Dortmund, Dortmund, Germany
Bernhard Steffen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Altendeitering, M. (2021). Mining Data Quality Rules for Data Migrations: A Case Study on Material Master Data. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. ISoLA 2021. Lecture Notes in Computer Science(), vol 13036. Springer, Cham. https://doi.org/10.1007/978-3-030-89159-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-89159-6_12
Published: 12 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89158-9
Online ISBN: 978-3-030-89159-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics