Abstract
During data input into databases like CMDB (Configuration Management Databases) we usually have a conformity check. The conventional approach is to create hundreds of rules to check data quality. This article shows several ideas on how to use ML algorithms to support the quality management of CMDB. We focus on naming conventions commonly used in CI (Configuration Items) - attributes like hostnames, serial numbers, and application names. Such attributes should be consistent with some dictionary data (operating system names, vendors) and existing relationships (location, applications). We review several strategies for feature extraction including tokenization and analyze the usability of CNB, RVAE, or NN to this particular problem. We also show the results of experiments on a public dataset (USA car database) to demonstrate the efficiency and inspire other researchers to work on similar topics. Algorithms used in the experiment are published as Jupiter Lab files.
This work is supported by the Polish Minister of Education and Science as part of an implementation doctorate, grant No. DWD/5/0286/2021.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Babur, Ö., Cleophas, L.: Using n-grams for the automated clustering of structural models. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds.) SOFSEM 2017. LNCS, vol. 10139, pp. 510–524. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51963-0_40
Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endow. 1(1), 1166–1177 (2008)
Drogseth, D., Sturm, R., Twing, D.: CMDB Systems: Making Change Work in the Age of Cloud and Agile. Morgan Kaufmann, Burlington (2015)
Eduardo, S., Nazábal, A., Williams, C.K., Sutton, C.: Robust variational autoencoders for outlier detection and repair of mixed-type data. In: International Conference on Artificial Intelligence and Statistics, pp. 4056–4066. PMLR (2020)
Eppler, M.J.: Information quality problems and current approaches. Managing Inf. Qual. 15–49 (2003)
Fan, W., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. J. Data Inf. Qual. (JDIQ) 4(4), 1–38 (2014)
Ilyas, I.F., Rekatsinas, T.: Machine learning and data cleaning: which serves the other? ACM J. Data Inf. Qual. (JDIQ) 14(3), 1–11 (2022)
Li, J., Zhang, X., Zhao, L.: Robust federated learning based on metrics learning and unsupervised clustering for malicious data detection. In: Proceedings of the 2022 ACM Southeast Conference, pp. 238–242 (2022)
Liu, Z., Zhou, Z., Rekatsinas, T.: Picket: guarding against corrupted data in tabular data during learning and inference. VLDB J. 31(5), 927–955 (2022)
Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H.: Overview and framework for data and information quality research. J. Data Inf. Qual. (JDIQ) 1(1), 1–22 (2009)
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_23
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 616–623 (2003)
Wand, M., Jones, H.: Kernel Smoothing. Chapman and Hall, London (1995)
Whitelaw, C., Patrick, J.: Evaluating corpora for named entity recognition using character-level features. In: Gedeon, T.T.D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 910–921. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24581-0_78
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Niewiadomski, S., Mzyk, G. (2023). ML Support for Conformity Checks in CMDB-Like Databases. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-42508-0_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42507-3
Online ISBN: 978-3-031-42508-0
eBook Packages: Computer ScienceComputer Science (R0)