Abstract
Good quality metadata can be a contributing factor when it comes to large scale data integration. In order to minimize data fetch and access request in data lakes, metadata can present adequate solutions that require minimal data access provided metadata exists. Metadata discovery can help us understand how data semantics operate, intrinsic and extrinsic data relationships as well as features that guide query processing, data management, and data integration. Metadata is mostly generated using manual annotation or is discovered through data profiling. What we are looking to explore as a part of our research is to understand available metadata and create profiles that can serve as ‘menu card’ for the other datasets in the data lake. In this paper, we present a technique for generating metadata profiles using goal based and rule-based agents. To this end, we apply simple rules and guide agents with actionable goals to attain an automatic categorization of a metadata file. Our technique was evaluated experimentally, the results show that applied techniques allow comparing multiple metadata profiles in order to compute similarity and difference measures.
Supported by Erasmus Mundus IT4BI-DC.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abedjan, Z., Golab, L., Naumann, F.: Data profiling. In: IEEE International Conference on Data Engineering (ICDE), pp. 1432–1435 (2016)
Halevy, A.Y., et al.: Goods: organizing Google’s datasets. In: ACM SIGMOD International Conference on Management of Data, pp. 795–806 (2016)
Hewasinghage, M., Varga, J., Abelló, A., Zimányi, E.: Managing polyglot systems metadata with hypergraphs. In: Trujillo, J.C., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 463–478. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00847-5_33
IEEE Standards Association: IEEE Big Data Governance and Metadata Management (BDGMM). https://standards.ieee.org/industry-connections/BDGMM-index.html
Kolaitis, P.G.: Reflections on schema mappings, data exchange, and metadata management. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 107–109 (2018)
Poole, J., Chang, D., Tolbert, D., Mellor, D.: Common Warehouse Metamodel. Developer’s Guide. Wiley, Hoboken (2003)
Russom, P.: Data lakes: purposes, practices, patterns, and platforms. TDWI White Paper (2017)
Suriarachchi, I., Plale, B.: Provenance as essential infrastructure for data lakes. In: Mattoso, M., Glavic, B. (eds.) IPAW 2016. LNCS, vol. 9672, pp. 178–182. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40593-3_16
Varga, J., Romero, O., Pedersen, T.B., Thomsen, C.: Analytical metadata modeling for next generation BI systems. J. Syst. Softw. 144, 240–254 (2018)
Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 25(3), 38–49 (1992)
Wu, D., Sakr, S., Zhu, L.: HDM: optimized big data processing with data provenance. In: International Conference on Extending Database Technology (EDBT), pp. 530–533 (2017)
Wylot, M., Cudré-Mauroux, P., Hauswirth, M., Groth, P.T.: Storing, tracking, and querying provenance in linked data. IEEE Trans. Knowl. Data Eng. (TKDE) 29(8), 1751–1764 (2017)
Acknowledgment
This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence-Doctoral College (IT4BI-DC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Khalid, H., Zimányi, E. (2019). Using Rule and Goal Based Agents to Create Metadata Profiles. In: Welzer, T., et al. New Trends in Databases and Information Systems. ADBIS 2019. Communications in Computer and Information Science, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-030-30278-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-30278-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30277-1
Online ISBN: 978-3-030-30278-8
eBook Packages: Computer ScienceComputer Science (R0)