Skip to main content

OntoCSM: Ontology-Aware Characteristic Set Merging for RDF Type Discovery

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12681))

Included in the following conference series:

  • 2622 Accesses

Abstract

With the growing popularity and application of knowledge-based artificial intelligence, the scale of knowledge graph data is dramatically increasing. The RDF, as one of the mainstream models of knowledge graphs, is widely used to describe the characteristics of Web resources due to its simplicity and flexibility. However, RDF datasets are usually incomplete (without information) and noisy, which hinders downstream tasks. RDF entities can be characterized by their characteristic sets that is the sets of predicates of the RDF entities. Since untyped entities can be assigned to closest types by merging characteristic sets, optimally merging characteristic sets has become a crucial issue. In this paper, aiming at the Optimal Characteristic Set Merge Problem (OCSMP), we propose an Ontology-Aware Characteristic Set Merging algorithm, called OntoCSM, which extracts an ontology hierarchy using RDF characteristic sets and guides the merging process by optimizing the objective function. Extensive experiments on various datasets show that the efficiency of OntoCSM is generally higher than that of the state-of-the-art algorithms and can be improved by orders of magnitude in the best case. The accuracy and scalability of our method have been verified, which shows that OntoCSM can reach competitive results to the existing algorithms while being ontology-aware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://data.semanticweb.org/dumps/conferences/dc-2010-complete.rdf.

  2. 2.

    https://opendata.swiss/dataset.

  3. 3.

    http://dbpedia.org/.

  4. 4.

    http://datahub.io/fr/dataset/data-bnf-fr.

References

  1. World Wide Web Consortium: RDF 1.1 concepts and abstract syntax (2014)

    Google Scholar 

  2. Rizzo, G., Fanizzi, N., d’Amato, C., Esposito, F.: Prediction of class and property assertions on OWL ontologies through evidence combination. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics, pp. 1–9 (2011)

    Google Scholar 

  3. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  4. Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 481–495. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_36

    Chapter  Google Scholar 

  5. Kellou-Menouer, K., Kedad, Z.: A self-adaptive and incremental approach for data profiling in the semantic web. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX. LNCS, vol. 10120, pp. 108–133. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-54037-4_4

    Chapter  Google Scholar 

  6. Christodoulou, K., Paton, N.W., Fernandes, A.A.A.: Structure inference for linked data sources using clustering. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., De Antonellis, V., De Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 1–25. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46562-2_1

    Chapter  Google Scholar 

  7. Polleres, A., Hogan, A., Harth, A., Decker, S.: Can we ever catch up with the web? Semantic Web 1(1, 2), 45–52 (2010)

    Google Scholar 

  8. Ji, Q., Gao, Z., Huang, Z.: Reasoning with noisy semantic data. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6644, pp. 497–502. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21064-8_42

    Chapter  Google Scholar 

  9. Neville, J., Jensen, D.: Iterative classification in relational data. In: Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20 (2000)

    Google Scholar 

  10. Bühmann, L., Lehmann, J., Westphal, P.: DL-learner-a framework for inductive learning on the semantic web. J. Web Semant. 39, 15–24 (2016)

    Article  Google Scholar 

  11. Čebirić, Š., Goasdoué, F., Manolescu, I.: Query-oriented summarization of RDF graphs. In: Maneth, S. (ed.) BICOD 2015. LNCS, vol. 9147, pp. 87–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20424-6_9

    Chapter  Google Scholar 

  12. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 984–994. IEEE (2011)

    Google Scholar 

  13. Dasarathy, B.V.: Nearest Neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Tutorial (1991)

    Google Scholar 

  14. Chen, J.X., Reformat, M.Z.: Learning categories from linked open data. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014. CCIS, vol. 444, pp. 396–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08852-5_41

    Chapter  Google Scholar 

  15. Zong, N., Im, D.H., Yang, S., Namgoon, H., Kim, H.G.: Dynamic generation of concepts hierarchies for knowledge discovering in bio-medical linked data sets. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, pp. 1–5 (2012)

    Google Scholar 

  16. Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_9

    Chapter  Google Scholar 

  17. Meimaris, M., Papastefanatos, G., Mamoulis, N., Anagnostopoulos, I.: Extended characteristic sets: graph indexing for sparql query optimization. In: IEEE 33rd International Conference on Data Engineering (ICDE), pp. 497–508. IEEE (2017)

    Google Scholar 

  18. Lutov, A., Roshankish, S., Khayati, M., Cudré-Mauroux, P.: Statix-statistical type inference on linked data. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2253–2262. IEEE (2018)

    Google Scholar 

Download references

Acknowledgment

This work is supported by the National Key Research and Development Program of China (2019YFE0198600), National Natural Science Foundation of China (61972275), and CCF-Huawei Database Innovation Research Plan (CCF-Huawei DBIR2019004B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, P., Cai, S., Liu, B., Wang, X. (2021). OntoCSM: Ontology-Aware Characteristic Set Merging for RDF Type Discovery. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73194-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73193-9

  • Online ISBN: 978-3-030-73194-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics