Detecting Identical Entities in the Semantic Web Data

Holub, Michal; Proksa, Ondrej; Bieliková, Mária

doi:10.1007/978-3-662-46078-8_43

Michal Holub²⁰,
Ondrej Proksa²⁰ &
Mária Bieliková²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8939))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Informatics

1291 Accesses
3 Citations

Abstract

Large amount of entities published by various sources inevitably introduces inaccuracies, mainly duplicated information. These can even be found within a single dataset. In this paper we propose a method for automatic discovery of identity relationship between two entities (also known as instance matching) in a dataset represented as a graph (e.g. in the Linked Data Cloud). Our method can be used for cleaning existing datasets from duplicates, validating of existing identity relationships between entities within a dataset, or for connecting different datasets using the owl:sameAs relationship. Our method is based on the analysis of sub-graphs formed by entities, their properties and existing relationships between them. It can learn a common similarity threshold for particular dataset, so it is adaptable to its different properties. We evaluated our method by conducting several experiments on data from the domains of public administration and digital libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Araujo, S., Tran, D.T., de Vries, A.P., Schwabe, D.: SERIMI: Class-based Disambiguation for Effective Instance Matching over Heterogeneous Web Data. In: Proc. of 15th Int. Workshop on the Web and Databases, WebDB 2012, pp. 25–30 (2012)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Aumueller, D., Do, H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proc. of 2005 ACM SIGMOD Int. Conf. on Management of Data, pp. 906–908. ACM Press (2005)
Google Scholar
Holub, M., Móro, R., Ševcech, J., Lipták, M., Bieliková, M.: Annota: Towards Enriching Scientific Publications with Semantics and User Annotations. D-Lib Magazine 20(11/12) (2014)
Google Scholar
Ferrara, A., Nikolov, A., Scharffe, F.: Data Linking for the Semantic Web. Int. Journal on Semantic Web and Information Systems 7(3), 46–76 (2011)
Article Google Scholar
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)
Chapter Google Scholar
Harth, A., Hose, K., Schenkel, R.: Database Techniques for Linked Data Management. In: Proc. of 2012 ACM SIGMOD Int. Conf. on Management of Data, pp. 597–600. ACM Press (2012)
Google Scholar
Lehmann, J., Schüppel, J., Auer, S.: Discovering Unknown Connections - the DBpedia Relationship Finder. In: Proc. of 1st Conf. on Social Semantic Web, CSSW, vol. 113, pp. 99–110 (2007)
Google Scholar
Leitão, L., Calado, P., Herschel, M.: Efficient and Effective Duplicate Detection in Hierarchical Data. IEEE Trans. on Knowledge and Data Engineering 25(5), 1028–1041 (2013)
Article Google Scholar
Ley, M.: The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 1–10. Springer, Heidelberg (2002)
Chapter Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In: Proc. of 18th Int. Conf. on Data Engineering, pp. 117–128. IEEE CS (2002)
Google Scholar
Ngomo, A.N., Auer, S.: LIMES: A Time-efficient Approach for Large-scale Link Discovery on the Web of Data. In: Proc. of 22nd Int. Joint Conf. on Artificial Intelligence, pp. 2312–2317. AAAI Press (2011)
Google Scholar
Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised Learning of Link Discovery Configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)
Chapter Google Scholar
Shvaiko, P., Euzenat, J.: A Survey of Schema-based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Chapter Google Scholar
Shvaiko, P., Euzenat, J.: Ontology Matching: State of the Art and Future Challenges. IEEE Trans. on Knowledge and Data Engineering 25(1), 158–176 (2013)
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proc. of 16th Int. Conf. on World Wide Web, pp. 697–706. ACM Press (2007)
Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - A Link Discovery Framework for the Web of Data. In: Proc. of the Linked Data on the Web Workshop (LDOW2009), CEUR Workshop Proceedings, vol. 538 (2009)
Google Scholar
Weikum, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In: Proc. of 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 65–76. ACM Press (2010)
Google Scholar
Zaïane, O.R., Chen, J., Goebel, R.: Mining Research Communities in Bibliographical Data. In: Zhang, H., et al. (eds.) WebKDD 2007. LNCS, vol. 5439, pp. 59–76. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhao, L., Ichsie, R.: Graph-based Ontology Analysis in the Linked Open Data. In: Proc. of 8th Int. Conf. on Semantic Systems, pp. 56–63. ACM Press (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology, Ilkovičova 2, 842 16, Bratislava, Slovakia
Michal Holub, Ondrej Proksa & Mária Bieliková

Authors

Michal Holub
View author publications
You can also search for this author in PubMed Google Scholar
Ondrej Proksa
View author publications
You can also search for this author in PubMed Google Scholar
Mária Bieliková
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Rome Tor Vergata, 00133, Rome, Italy
Giuseppe F. Italiano
University of Limerick, Ireland
Tiziana Margaria-Steffen
Charles University, Prague, Czech Republic
Jaroslav Pokorný
Université catholique de Louvain, Louvain, Belgium
Jean-Jacques Quisquater
ETH Zurich, Zurich, Switzerland
Roger Wattenhofer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holub, M., Proksa, O., Bieliková, M. (2015). Detecting Identical Entities in the Semantic Web Data. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_43

Download citation

DOI: https://doi.org/10.1007/978-3-662-46078-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46077-1
Online ISBN: 978-3-662-46078-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics