Abstract
Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
http://griffin.apache.org (June 2019).
- 2.
https://github.com/mobydq/mobydq (June 2019).
References
Aggarwal, C.C.: Outlier Analysis, 2nd edn. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-47578-3
Chapman, A.D.: Principles of data quality. Technical report, Global Biodiversity Information Facility Material (2005)
Chasparis, G., Zellinger, W., Haunschmid, V., Riedenbauer, M., Stumptner, R.: On the optimization of material usage in power transformer manufacturing. In: Proceedings of the 8th International Conference on Intelligent Systems. IEEE (2016)
Ehrlinger, L., Werth, B., Wöß, W.: Automated continuous data quality measurement with QuaIIe. Int. J. Adv. Softw. 11(3 & 4), 400–417 (2018)
Ehrlinger, L., Wöß, W.: Automated data quality monitoring. In: Proceedings of the 22nd MIT International Conference on Information Quality (ICIQ 2017), pp. 15.1–15.9 (2017)
Gerstl, A., Karisch, S.E.: Cost optimization for the slitting of core laminations for power transformers. Ann. Oper. Res. 69, 157–169 (1997)
Pigott, T.D.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)
Pushkarev, V., Neumann, H., Varol, C., Talburt, J.R.: An overview of open source data quality tools. In: Proceedings of Information and Knowledge Engineering Conference, pp. 370–376 (2010)
Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)
Sebastian-Coleman, L.: Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Newnes, New York (2012)
Selvege, M.Y., Judah, S., Jain, A.: Magic quadrant for data quality tools. Technical report, Gartner, October 2017
Sessions, V., Valtorta, M.: The effects of data quality on machine learning algorithms. In: Proceedings of the 11th International Conference on Information Quality (ICIQ 2006), vol. 6, pp. 485–498 (2006)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)
Acknowledgments
The research reported in this paper has been supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the Province of Upper Austria in the frame of the COMET center SCCH.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ehrlinger, L., Haunschmid, V., Palazzini, D., Lettner, C. (2019). A DaQL to Monitor Data Quality in Machine Learning Applications. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)