Skip to main content

Class Annotation Using Linked Open Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10033))

Abstract

The meaningful usage of RDF datasets requires a description of their content. Part of this description is provided in the dataset itself through class definitions. However, the name of a class does not always reflect accurately its semantics. This meaning can be captured by providing some annotations for each class.

In this paper, we present a set of algorithms exploiting the instances of a dataset in order to provide annotations which best capture the semantics of a class. These algorithms rely on an external knowledge source. We introduce three ways of extracting annotations: (i) using the names of instances, (ii) using their property sets and (iii) considering the vocabularies used by the dataset. As an external source, we have used Linked Open Data, which represents an unprecedented amount of knowledge provided on the Web. We also show how annotations can be used to discover a class hierarchy and we present some evaluation results showing the effectiveness of our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Conference: data.semanticweb.org/dumps/conferences/dc-2010-complete.rdf.

  2. 2.

    BNF: datahub.io/fr/dataset/data-bnf-fr.

  3. 3.

    DBpedia: dbpedia.org.

References

  1. Linked Open Data Cloud (LOD Cloud) cache, sparql endpoint. http://lod.openlinksw.com/

  2. Linked Open Vocabularies (LOV). http://lov.okfn.org/dataset/lov/

  3. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  4. Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 139–146. ACM (2009)

    Google Scholar 

  5. Christodoulou, K., Paton, N.W., Fernandes, A.A.A.: Structure inference for linked data sources using clustering. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., Antonellis, V., Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 1–25. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46562-2_1

    Google Scholar 

  6. Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. IEEE Softw. 1(29), 70–75 (2012)

    Article  Google Scholar 

  7. Fuglede, B., Topsøe, F.: Jensen-shannon divergence and hilbert space embedding. In: Proceedings of the International Symposium on Information Theory, ISIT, p. 31. IEEE (2004)

    Google Scholar 

  8. Hagen, M., Michel, M., Stein, B.: What was the query? generating queries for document sets with applications in cluster labeling. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 124–133. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19581-0_10

    Chapter  Google Scholar 

  9. Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: Fuzzy annotation of web data tables driven by a domain ontology. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 638–653. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02121-3_47

    Chapter  Google Scholar 

  10. Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 481–495. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25264-3_36

    Chapter  Google Scholar 

  11. Kellou-Menouer, K., Kedad, Z.: Discovering types in RDF datasets. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 77–81. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25639-9_15

    Chapter  Google Scholar 

  12. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)

    Article  Google Scholar 

  13. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)

    Google Scholar 

  14. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)

    Google Scholar 

  15. Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data. In: ACM SIGMOD Record, vol. 27, pp. 295–306. ACM (1998)

    Google Scholar 

  16. Oram, P.: Wordnet: an electronic lexical database. In: Fellbaum, C. (ed.) Mit Press, Cambridge (2001)

    Google Scholar 

  17. Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 251–260. IEEE (1995)

    Google Scholar 

  18. Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68(11), 1289–1308 (2009)

    Article  Google Scholar 

  19. Quercini, G., Reynaud, C.: Entity discovery, annotation in tables. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 693–704. ACM (2013)

    Google Scholar 

  20. Röder, M., Usbeck, R., Speck, R., Ngomo, A.-C.N.: CETUS – a baseline approach to type extraction. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 16–27. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25518-7_2

    Chapter  Google Scholar 

  21. Stein, B., Zu Eissen, S.M.: Topic identification: Framework and application. In: Proceedings of the International Conference on Knowledge Management (2004)

    Google Scholar 

  22. Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the International Conference on Digital Government Research (2006)

    Google Scholar 

  23. Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endowment 4(9), 528–538 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially funded by the French National Research Agency through the CAIR ANR-14-CE23-0006 project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenza Kellou-Menouer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Kellou-Menouer, K., Kedad, Z. (2016). Class Annotation Using Linked Open Data. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48472-3_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48471-6

  • Online ISBN: 978-3-319-48472-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics