Abstract
The meaningful usage of RDF datasets requires a description of their content. Part of this description is provided in the dataset itself through class definitions. However, the name of a class does not always reflect accurately its semantics. This meaning can be captured by providing some annotations for each class.
In this paper, we present a set of algorithms exploiting the instances of a dataset in order to provide annotations which best capture the semantics of a class. These algorithms rely on an external knowledge source. We introduce three ways of extracting annotations: (i) using the names of instances, (ii) using their property sets and (iii) considering the vocabularies used by the dataset. As an external source, we have used Linked Open Data, which represents an unprecedented amount of knowledge provided on the Web. We also show how annotations can be used to discover a class hierarchy and we present some evaluation results showing the effectiveness of our approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
DBpedia: dbpedia.org.
References
Linked Open Data Cloud (LOD Cloud) cache, sparql endpoint. http://lod.openlinksw.com/
Linked Open Vocabularies (LOV). http://lov.okfn.org/dataset/lov/
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using wikipedia. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 139–146. ACM (2009)
Christodoulou, K., Paton, N.W., Fernandes, A.A.A.: Structure inference for linked data sources using clustering. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., Antonellis, V., Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 1–25. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46562-2_1
Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. IEEE Softw. 1(29), 70–75 (2012)
Fuglede, B., Topsøe, F.: Jensen-shannon divergence and hilbert space embedding. In: Proceedings of the International Symposium on Information Theory, ISIT, p. 31. IEEE (2004)
Hagen, M., Michel, M., Stein, B.: What was the query? generating queries for document sets with applications in cluster labeling. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 124–133. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19581-0_10
Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: Fuzzy annotation of web data tables driven by a domain ontology. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 638–653. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02121-3_47
Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 481–495. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25264-3_36
Kellou-Menouer, K., Kedad, Z.: Discovering types in RDF datasets. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 77–81. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25639-9_15
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)
Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data. In: ACM SIGMOD Record, vol. 27, pp. 295–306. ACM (1998)
Oram, P.: Wordnet: an electronic lexical database. In: Fellbaum, C. (ed.) Mit Press, Cambridge (2001)
Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 251–260. IEEE (1995)
Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68(11), 1289–1308 (2009)
Quercini, G., Reynaud, C.: Entity discovery, annotation in tables. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 693–704. ACM (2013)
Röder, M., Usbeck, R., Speck, R., Ngomo, A.-C.N.: CETUS – a baseline approach to type extraction. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 16–27. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25518-7_2
Stein, B., Zu Eissen, S.M.: Topic identification: Framework and application. In: Proceedings of the International Conference on Knowledge Management (2004)
Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: Proceedings of the International Conference on Digital Government Research (2006)
Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endowment 4(9), 528–538 (2011)
Acknowledgments
This work was partially funded by the French National Research Agency through the CAIR ANR-14-CE23-0006 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kellou-Menouer, K., Kedad, Z. (2016). Class Annotation Using Linked Open Data. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-48472-3_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48471-6
Online ISBN: 978-3-319-48472-3
eBook Packages: Computer ScienceComputer Science (R0)