Skip to main content

Consistency Assessment of Datasets in the Context of a Problem Domain

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12799))

Abstract

Datasets have many applications, e.g., are used for software testing or serve as training data in artificial intelligence. In any case, they must be of good quality, i.e., be consistent with the domain represented by the dataset. The work aims to propose an approach to checking the consistency between a dataset and its domain. It is assumed that the collected data has a form of uninterpretable records, except for the knowledge of the attributes’ names. The domain is represented by an ontology in the form of a UML class diagram. The proposed method consists of two stages: first, a UML class diagram is generated from a dataset, and then it is compared with the diagram representing the domain ontology with the use of defined measures. A case study illustrates the proposed approach. It has been shown that the proposed measures help to find inconsistencies and improve data quality. The proposed method enables the quality assessment of a data sample.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Fleckenstein, M., Fellows, L.: Modern Data Strategy. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68993-7

  2. Sadowska, M., Huzar, Z.: Representation of UML class diagrams in OWL 2 on the background of domain ontologies. E-Informatica Softw. Eng. J. 13(1), 63–103 (2019)

    Google Scholar 

  3. Robles, K., Fraga, A., Morato, J., Llorens, J.: Towards an ontology-based retrieval of UML class diagrams. Inf. Softw. Technol. 54, 72–86 (2012)

    Article  Google Scholar 

  4. Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L.: A data-driven conceptual modeling. In: Jarzabek, S., Poniszewska-Marańda, A., Madeyski, L. (eds.) Integrating research and practice in software engineering. SCI, vol. 851, pp. 97–109. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26574-8_8

    Chapter  Google Scholar 

  5. Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L.: Extracting class diagram from hidden dependencies in data sets. Comput. Sci. 21(2), 197–223 (2020)

    Article  Google Scholar 

  6. Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data – a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)

    Google Scholar 

  7. Hnatkowska, B.: Visualization of structural dependencies hidden in a large data set. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) ICCCI 2020. CCIS, vol. 1287, pp. 427–439. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_35

    Chapter  Google Scholar 

  8. Sadowska, M.: Creating and validating UML class diagrams with the use of domain ontologies expressed in OWL 2. Doctoral Thesis, Faculty of Computer Science and Management, Wroclaw University of Science and Technology (2020)

    Google Scholar 

  9. Mojeeb, A.A., Moataz A.: UML class diagrams: similarity aspects and matching, LNCS, vol. 4, no. 1, pp. 41–47 (2016)

    Google Scholar 

  10. Cech, P.: Matching UML class models using graph edit distance. Expert Syst. Appl. 130, 206–224 (2019)

    Article  Google Scholar 

  11. Wei-Jin, P., Doo-Hwan, B.: A two-stage framework for UML specification matching. Inf. Softw. Technol. 53, 230–244 (2011)

    Article  Google Scholar 

  12. Zongmin, M., Zhongchen, Y., Li, Y.: Two-level clustering of UML class diagrams based on semantics and structure. Inform. Softw. Technol. (in Print)

    Google Scholar 

  13. Hnatkowska, B.: Class diagram comparison tool (2021). github.com/bhnatkowska/ClassDiagramComparison

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bogumila Hnatkowska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hnatkowska, B., Huzar, Z., Tuzinkiewicz, L. (2021). Consistency Assessment of Datasets in the Context of a Problem Domain. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12799. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79463-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79462-0

  • Online ISBN: 978-3-030-79463-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics