Abstract
Confidentiality hinders the publication of authentic, labeled datasets of personal and enterprise data, although they could be useful for evaluating knowledge graph construction approaches in industrial scenarios. Therefore, our plan is to synthetically generate such data in a way that it appears as authentic as possible. Based on our assumption that knowledge workers have certain habits when they produce or manage data, generation patterns could be discovered which can be utilized by data generators to imitate real datasets. In this paper, we initially derived 11 distinct patterns found in real spreadsheets from industry and demonstrate a suitable generator called Data Sprout that is able to reproduce them. We describe how the generator produces spreadsheets in general and what altering effects the implemented patterns have.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., Angel, S.: A Pattern Language - Towns, Buildings, Construction. Oxford University Press, Oxford (1977)
Jilek, C., et al.: Managed forgetting to support information management and knowledge work. Künstliche Intell. 33(1), 45–55 (2019)
Popic, S., Pavkovic, B., Velikic, I., Teslic, N.: Data generators: a short survey of techniques and use cases with focus on testing. In: 9th IEEE International Conference on Consumer Electronics, ICCE, Berlin, Germany, pp. 189–194. IEEE (2019)
Schröder, M., Jilek, C., Schulze, M., Dengel, A.: The person index challenge: extraction of persons from messy, short texts. In: Proceedings of the 9th International Conference on Agents and Artificial Intelligence, ICAART 2021, pp. 531–537, January 2021
Acknowledgements
This work was funded by the BMBF project SensAI (grant no. 01IW20007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Schröder, M., Jilek, C., Dengel, A. (2021). Dataset Generation Patterns for Evaluating Knowledge Graph Construction. In: Verborgh, R., et al. The Semantic Web: ESWC 2021 Satellite Events. ESWC 2021. Lecture Notes in Computer Science(), vol 12739. Springer, Cham. https://doi.org/10.1007/978-3-030-80418-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-80418-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80417-6
Online ISBN: 978-3-030-80418-3
eBook Packages: Computer ScienceComputer Science (R0)