Abstract
Datasets have many applications, e.g., are used for software testing or serve as training data in artificial intelligence. In any case, they must be of good quality, i.e., be consistent with the domain represented by the dataset. The work aims to propose an approach to checking the consistency between a dataset and its domain. It is assumed that the collected data has a form of uninterpretable records, except for the knowledge of the attributes’ names. The domain is represented by an ontology in the form of a UML class diagram. The proposed method consists of two stages: first, a UML class diagram is generated from a dataset, and then it is compared with the diagram representing the domain ontology with the use of defined measures. A case study illustrates the proposed approach. It has been shown that the proposed measures help to find inconsistencies and improve data quality. The proposed method enables the quality assessment of a data sample.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Fleckenstein, M., Fellows, L.: Modern Data Strategy. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68993-7
Sadowska, M., Huzar, Z.: Representation of UML class diagrams in OWL 2 on the background of domain ontologies. E-Informatica Softw. Eng. J. 13(1), 63–103 (2019)
Robles, K., Fraga, A., Morato, J., Llorens, J.: Towards an ontology-based retrieval of UML class diagrams. Inf. Softw. Technol. 54, 72–86 (2012)
Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L.: A data-driven conceptual modeling. In: Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L. (eds.) Integrating research and practice in software engineering. SCI, vol. 851, pp. 97–109. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26574-8_8
Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L.: Extracting class diagram from hidden dependencies in data sets. Comput. Sci. 21(2), 197–223 (2020)
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data – a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)
Hnatkowska, B.: Visualization of structural dependencies hidden in a large data set. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) ICCCI 2020. CCIS, vol. 1287, pp. 427–439. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_35
Sadowska, M.: Creating and validating UML class diagrams with the use of domain ontologies expressed in OWL 2. Doctoral Thesis, Faculty of Computer Science and Management, Wroclaw University of Science and Technology (2020)
Mojeeb, A.A., Moataz A.: UML class diagrams: similarity aspects and matching, LNCS, vol. 4, no. 1, pp. 41–47 (2016)
Cech, P.: Matching UML class models using graph edit distance. Expert Syst. Appl. 130, 206–224 (2019)
Wei-Jin, P., Doo-Hwan, B.: A two-stage framework for UML specification matching. Inf. Softw. Technol. 53, 230–244 (2011)
Zongmin, M., Zhongchen, Y., Li, Y.: Two-level clustering of UML class diagrams based on semantics and structure. Inform. Softw. Technol. (in Print)
Hnatkowska, B.: Class diagram comparison tool (2021). github.com/bhnatkowska/ClassDiagramComparison
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L. (2021). Consistency Assessment of Datasets in the Context of a Problem Domain. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12799. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-79463-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79462-0
Online ISBN: 978-3-030-79463-7
eBook Packages: Computer ScienceComputer Science (R0)