Consistency Assessment of Datasets in the Context of a Problem Domain

Hnatkowska, Bogumila; Huzar, Zbigniew; Tuzinkiewicz, Lech

doi:10.1007/978-3-030-79463-7_10

Consistency Assessment of Datasets in the Context of a Problem Domain

Conference paper
First Online: 19 July 2021

1129 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12799))

Abstract

Datasets have many applications, e.g., are used for software testing or serve as training data in artificial intelligence. In any case, they must be of good quality, i.e., be consistent with the domain represented by the dataset. The work aims to propose an approach to checking the consistency between a dataset and its domain. It is assumed that the collected data has a form of uninterpretable records, except for the knowledge of the attributes’ names. The domain is represented by an ontology in the form of a UML class diagram. The proposed method consists of two stages: first, a UML class diagram is generated from a dataset, and then it is compared with the diagram representing the domain ontology with the use of defined measures. A case study illustrates the proposed approach. It has been shown that the proposed measures help to find inconsistencies and improve data quality. The proposed method enables the quality assessment of a data sample.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Fleckenstein, M., Fellows, L.: Modern Data Strategy. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68993-7
Sadowska, M., Huzar, Z.: Representation of UML class diagrams in OWL 2 on the background of domain ontologies. E-Informatica Softw. Eng. J. 13(1), 63–103 (2019)
Google Scholar
Robles, K., Fraga, A., Morato, J., Llorens, J.: Towards an ontology-based retrieval of UML class diagrams. Inf. Softw. Technol. 54, 72–86 (2012)
Article Google Scholar
Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L.: A data-driven conceptual modeling. In: Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L. (eds.) Integrating research and practice in software engineering. SCI, vol. 851, pp. 97–109. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26574-8_8
Chapter Google Scholar
Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L.: Extracting class diagram from hidden dependencies in data sets. Comput. Sci. 21(2), 197–223 (2020)
Article Google Scholar
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data – a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)
Google Scholar
Hnatkowska, B.: Visualization of structural dependencies hidden in a large data set. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) ICCCI 2020. CCIS, vol. 1287, pp. 427–439. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_35
Chapter Google Scholar
Sadowska, M.: Creating and validating UML class diagrams with the use of domain ontologies expressed in OWL 2. Doctoral Thesis, Faculty of Computer Science and Management, Wroclaw University of Science and Technology (2020)
Google Scholar
Mojeeb, A.A., Moataz A.: UML class diagrams: similarity aspects and matching, LNCS, vol. 4, no. 1, pp. 41–47 (2016)
Google Scholar
Cech, P.: Matching UML class models using graph edit distance. Expert Syst. Appl. 130, 206–224 (2019)
Article Google Scholar
Wei-Jin, P., Doo-Hwan, B.: A two-stage framework for UML specification matching. Inf. Softw. Technol. 53, 230–244 (2011)
Article Google Scholar
Zongmin, M., Zhongchen, Y., Li, Y.: Two-level clustering of UML class diagrams based on semantics and structure. Inform. Softw. Technol. (in Print)
Google Scholar
Hnatkowska, B.: Class diagram comparison tool (2021). github.com/bhnatkowska/ClassDiagramComparison

Download references

Author information

Authors and Affiliations

Department of Applied Informatics, Wroclaw University of Science and Technology, 50-370, Wroclaw, Poland
Bogumila Hnatkowska, Zbigniew Huzar & Lech Tuzinkiewicz

Authors

Bogumila Hnatkowska
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew Huzar
View author publications
You can also search for this author in PubMed Google Scholar
Lech Tuzinkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bogumila Hnatkowska .

Editor information

Editors and Affiliations

i-SOMET Incorporate Association, Morioka, Japan
Hamido Fujita
Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Ali Selamat
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
Texas State University San Marcos, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L. (2021). Consistency Assessment of Datasets in the Context of a Problem Domain. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12799. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-79463-7_10
Published: 19 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79462-0
Online ISBN: 978-3-030-79463-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics