Abstract
With the growing popularity and application of knowledge-based artificial intelligence, the scale of knowledge graph data is dramatically increasing. The RDF, as one of the mainstream models of knowledge graphs, is widely used to describe the characteristics of Web resources due to its simplicity and flexibility. However, RDF datasets are usually incomplete (without information) and noisy, which hinders downstream tasks. RDF entities can be characterized by their characteristic sets that is the sets of predicates of the RDF entities. Since untyped entities can be assigned to closest types by merging characteristic sets, optimally merging characteristic sets has become a crucial issue. In this paper, aiming at the Optimal Characteristic Set Merge Problem (OCSMP), we propose an Ontology-Aware Characteristic Set Merging algorithm, called OntoCSM, which extracts an ontology hierarchy using RDF characteristic sets and guides the merging process by optimizing the objective function. Extensive experiments on various datasets show that the efficiency of OntoCSM is generally higher than that of the state-of-the-art algorithms and can be improved by orders of magnitude in the best case. The accuracy and scalability of our method have been verified, which shows that OntoCSM can reach competitive results to the existing algorithms while being ontology-aware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
World Wide Web Consortium: RDF 1.1 concepts and abstract syntax (2014)
Rizzo, G., Fanizzi, N., d’Amato, C., Esposito, F.: Prediction of class and property assertions on OWL ontologies through evidence combination. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics, pp. 1–9 (2011)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 481–495. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_36
Kellou-Menouer, K., Kedad, Z.: A self-adaptive and incremental approach for data profiling in the semantic web. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX. LNCS, vol. 10120, pp. 108–133. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-54037-4_4
Christodoulou, K., Paton, N.W., Fernandes, A.A.A.: Structure inference for linked data sources using clustering. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., De Antonellis, V., De Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 1–25. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46562-2_1
Polleres, A., Hogan, A., Harth, A., Decker, S.: Can we ever catch up with the web? Semantic Web 1(1, 2), 45–52 (2010)
Ji, Q., Gao, Z., Huang, Z.: Reasoning with noisy semantic data. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6644, pp. 497–502. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21064-8_42
Neville, J., Jensen, D.: Iterative classification in relational data. In: Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20 (2000)
Bühmann, L., Lehmann, J., Westphal, P.: DL-learner-a framework for inductive learning on the semantic web. J. Web Semant. 39, 15–24 (2016)
Čebirić, Š., Goasdoué, F., Manolescu, I.: Query-oriented summarization of RDF graphs. In: Maneth, S. (ed.) BICOD 2015. LNCS, vol. 9147, pp. 87–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20424-6_9
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 984–994. IEEE (2011)
Dasarathy, B.V.: Nearest Neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Tutorial (1991)
Chen, J.X., Reformat, M.Z.: Learning categories from linked open data. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014. CCIS, vol. 444, pp. 396–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08852-5_41
Zong, N., Im, D.H., Yang, S., Namgoon, H., Kim, H.G.: Dynamic generation of concepts hierarchies for knowledge discovering in bio-medical linked data sets. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, pp. 1–5 (2012)
Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_9
Meimaris, M., Papastefanatos, G., Mamoulis, N., Anagnostopoulos, I.: Extended characteristic sets: graph indexing for sparql query optimization. In: IEEE 33rd International Conference on Data Engineering (ICDE), pp. 497–508. IEEE (2017)
Lutov, A., Roshankish, S., Khayati, M., Cudré-Mauroux, P.: Statix-statistical type inference on linked data. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2253–2262. IEEE (2018)
Acknowledgment
This work is supported by the National Key Research and Development Program of China (2019YFE0198600), National Natural Science Foundation of China (61972275), and CCF-Huawei Database Innovation Research Plan (CCF-Huawei DBIR2019004B).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, P., Cai, S., Liu, B., Wang, X. (2021). OntoCSM: Ontology-Aware Characteristic Set Merging for RDF Type Discovery. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-73194-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73193-9
Online ISBN: 978-3-030-73194-6
eBook Packages: Computer ScienceComputer Science (R0)