ML Support for Conformity Checks in CMDB-Like Databases

Niewiadomski, Szymon; Mzyk, Grzegorz

doi:10.1007/978-3-031-42508-0_33

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14126))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

288 Accesses

Abstract

During data input into databases like CMDB (Configuration Management Databases) we usually have a conformity check. The conventional approach is to create hundreds of rules to check data quality. This article shows several ideas on how to use ML algorithms to support the quality management of CMDB. We focus on naming conventions commonly used in CI (Configuration Items) - attributes like hostnames, serial numbers, and application names. Such attributes should be consistent with some dictionary data (operating system names, vendors) and existing relationships (location, applications). We review several strategies for feature extraction including tokenization and analyze the usability of CNB, RVAE, or NN to this particular problem. We also show the results of experiments on a public dataset (USA car database) to demonstrate the efficiency and inspire other researchers to work on similar topics. Algorithms used in the experiment are published as Jupiter Lab files.

This work is supported by the Polish Minister of Education and Science as part of an implementation doctorate, grant No. DWD/5/0286/2021.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Babur, Ö., Cleophas, L.: Using n-grams for the automated clustering of structural models. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds.) SOFSEM 2017. LNCS, vol. 10139, pp. 510–524. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51963-0_40
Chapter Google Scholar
Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endow. 1(1), 1166–1177 (2008)
Article Google Scholar
Drogseth, D., Sturm, R., Twing, D.: CMDB Systems: Making Change Work in the Age of Cloud and Agile. Morgan Kaufmann, Burlington (2015)
Google Scholar
Eduardo, S., Nazábal, A., Williams, C.K., Sutton, C.: Robust variational autoencoders for outlier detection and repair of mixed-type data. In: International Conference on Artificial Intelligence and Statistics, pp. 4056–4066. PMLR (2020)
Google Scholar
Eppler, M.J.: Information quality problems and current approaches. Managing Inf. Qual. 15–49 (2003)
Google Scholar
Fan, W., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. J. Data Inf. Qual. (JDIQ) 4(4), 1–38 (2014)
Article Google Scholar
Ilyas, I.F., Rekatsinas, T.: Machine learning and data cleaning: which serves the other? ACM J. Data Inf. Qual. (JDIQ) 14(3), 1–11 (2022)
Article Google Scholar
Li, J., Zhang, X., Zhao, L.: Robust federated learning based on metrics learning and unsupervised clustering for malicious data detection. In: Proceedings of the 2022 ACM Southeast Conference, pp. 238–242 (2022)
Google Scholar
Liu, Z., Zhou, Z., Rekatsinas, T.: Picket: guarding against corrupted data in tabular data during learning and inference. VLDB J. 31(5), 927–955 (2022)
Article Google Scholar
Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H.: Overview and framework for data and information quality research. J. Data Inf. Qual. (JDIQ) 1(1), 1–22 (2009)
Google Scholar
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_23
Chapter Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 616–623 (2003)
Google Scholar
Wand, M., Jones, H.: Kernel Smoothing. Chapman and Hall, London (1995)
Book MATH Google Scholar
Whitelaw, C., Patrick, J.: Evaluating corpora for named entity recognition using character-level features. In: Gedeon, T.T.D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 910–921. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24581-0_78
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Wroclaw University of Science and Technology, Wrocław, Poland
Szymon Niewiadomski & Grzegorz Mzyk

Authors

Szymon Niewiadomski
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Mzyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szymon Niewiadomski .

Editor information

Editors and Affiliations

Systems Research Institute of the Polish Academy of Sciences, Warsaw, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Krakow, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niewiadomski, S., Mzyk, G. (2023). ML Support for Conformity Checks in CMDB-Like Databases. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-42508-0_33
Published: 14 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42507-3
Online ISBN: 978-3-031-42508-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ML Support for Conformity Checks in CMDB-Like Databases