Skip to main content

Schema Discovery in RDF Data Sources

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

Abstract

The Web has become a huge information space consisting of interlinked datasets, enabling the design of new applications. The meaningful usage of these datasets is a challenge, as it requires some knowledge about their content such as their types and properties. In this paper, we present an automatic approach for schema discovery in RDF(S)/OWL datasets.

We consider a schema as a set of type and link definitions. Our contribution is twofold: (i) generating the types describing a dataset, along with a description for each of them called type profile; (ii) generating the semantic links between types as well as the hierarchical links through the analysis of type profiles. Our approach relies on a density-based clustering algorithm and it does not require any schema-related information in the dataset. We have implemented the proposed algorithms and we present some evaluation results showing the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Silk: wifo5-03.informatik.uni-mannheim.de/bizer/silk.

  2. 2.

    Conference: data.semanticweb.org/dumps/conferences/dc-2010-complete.rdf.

  3. 3.

    BNF: datahub.io/fr/dataset/data-bnf-fr.

  4. 4.

    DBpedia: dbpedia.org.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a Web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Christodoulou, K., Paton, N.W., Fernandes, A.A.: Structure inference for linked data sources using clustering. In: EDBT/ICDT 2013 Workshops (2013)

    Google Scholar 

  3. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd (1996)

    Google Scholar 

  4. Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Klettke, M.: Reuse of database design decisions. In: Kouloumdjian, J., Roddick, J., Chen, P.P., Embley, D.W., Liddle, S.W. (eds.) ER Workshops 1999. LNCS, vol. 1727, pp. 213–224. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  6. Konrath, M., Gottron, T., Staab, S., Scherp, A.: Schemex: efficient construction of a data catalogue by stream-based indexing of linked data. WWW 16, 52–58 (2012)

    Google Scholar 

  7. Lammari, N., Comyn-Wattiau, I., Akoka, J.: Extracting generalization hierarchies from relational databases: a reverse engineering approach. Data Knowl. Eng. 63(2), 568–589 (2007)

    Article  Google Scholar 

  8. Nestorov, S., Abiteboul, S., Motwani, R.: Inferring structure in semistructured data. ACM SIGMOD Rec. 26(4), 39–43 (1997)

    Article  Google Scholar 

  9. Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data. ACM SIGMOD Rec. 27, 295–306 (1998). ACM

    Article  Google Scholar 

  10. Nuzzolese, A.G., Gangemi, A., Presutti, V., Ciancarini, P.: Type inference through the analysis of Wikipedia links. In: LDOW (2012)

    Google Scholar 

  11. Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 251–260. IEEE (1995)

    Google Scholar 

  12. Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Sánchez-Díaz, G., Martínez-Trinidad, J.F.: Determination of similarity threshold in clustering problems for large data sets. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 611–618. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (2007)

    Google Scholar 

  15. Völker, J., Niepert, M.: Statistical schema induction. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 124–138. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Wang, Q.Y., Yu, J.X., Wong, K.-F.: Approximate graph schema extraction for semi-structured data. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 302–316. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  17. Zong, N., Im, D.-H., Yang, S., Namgoon, H., Kim, H.-G.: Dynamic generation of concepts hierarchies for knowledge discovering in bio-medical linked data sets. In: ICUIMC. ACM (2012)

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by the French National Research Agency through the CAIR ANR-14-CE23-0006 project.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kenza Kellou-Menouer or Zoubida Kedad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kellou-Menouer, K., Kedad, Z. (2015). Schema Discovery in RDF Data Sources. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25264-3_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25263-6

  • Online ISBN: 978-3-319-25264-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics