Abstract
Large amount of entities published by various sources inevitably introduces inaccuracies, mainly duplicated information. These can even be found within a single dataset. In this paper we propose a method for automatic discovery of identity relationship between two entities (also known as instance matching) in a dataset represented as a graph (e.g. in the Linked Data Cloud). Our method can be used for cleaning existing datasets from duplicates, validating of existing identity relationships between entities within a dataset, or for connecting different datasets using the owl:sameAs relationship. Our method is based on the analysis of sub-graphs formed by entities, their properties and existing relationships between them. It can learn a common similarity threshold for particular dataset, so it is adaptable to its different properties. We evaluated our method by conducting several experiments on data from the domains of public administration and digital libraries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Araujo, S., Tran, D.T., de Vries, A.P., Schwabe, D.: SERIMI: Class-based Disambiguation for Effective Instance Matching over Heterogeneous Web Data. In: Proc. of 15th Int. Workshop on the Web and Databases, WebDB 2012, pp. 25–30 (2012)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Aumueller, D., Do, H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proc. of 2005 ACM SIGMOD Int. Conf. on Management of Data, pp. 906–908. ACM Press (2005)
Holub, M., Móro, R., Ševcech, J., Lipták, M., Bieliková, M.: Annota: Towards Enriching Scientific Publications with Semantics and User Annotations. D-Lib Magazine 20(11/12) (2014)
Ferrara, A., Nikolov, A., Scharffe, F.: Data Linking for the Semantic Web. Int. Journal on Semantic Web and Information Systems 7(3), 46–76 (2011)
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)
Harth, A., Hose, K., Schenkel, R.: Database Techniques for Linked Data Management. In: Proc. of 2012 ACM SIGMOD Int. Conf. on Management of Data, pp. 597–600. ACM Press (2012)
Lehmann, J., Schüppel, J., Auer, S.: Discovering Unknown Connections - the DBpedia Relationship Finder. In: Proc. of 1st Conf. on Social Semantic Web, CSSW, vol. 113, pp. 99–110 (2007)
Leitão, L., Calado, P., Herschel, M.: Efficient and Effective Duplicate Detection in Hierarchical Data. IEEE Trans. on Knowledge and Data Engineering 25(5), 1028–1041 (2013)
Ley, M.: The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 1–10. Springer, Heidelberg (2002)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In: Proc. of 18th Int. Conf. on Data Engineering, pp. 117–128. IEEE CS (2002)
Ngomo, A.N., Auer, S.: LIMES: A Time-efficient Approach for Large-scale Link Discovery on the Web of Data. In: Proc. of 22nd Int. Joint Conf. on Artificial Intelligence, pp. 2312–2317. AAAI Press (2011)
Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised Learning of Link Discovery Configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)
Shvaiko, P., Euzenat, J.: A Survey of Schema-based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Shvaiko, P., Euzenat, J.: Ontology Matching: State of the Art and Future Challenges. IEEE Trans. on Knowledge and Data Engineering 25(1), 158–176 (2013)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proc. of 16th Int. Conf. on World Wide Web, pp. 697–706. ACM Press (2007)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - A Link Discovery Framework for the Web of Data. In: Proc. of the Linked Data on the Web Workshop (LDOW2009), CEUR Workshop Proceedings, vol. 538 (2009)
Weikum, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In: Proc. of 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 65–76. ACM Press (2010)
Zaïane, O.R., Chen, J., Goebel, R.: Mining Research Communities in Bibliographical Data. In: Zhang, H., et al. (eds.) WebKDD 2007. LNCS, vol. 5439, pp. 59–76. Springer, Heidelberg (2009)
Zhao, L., Ichsie, R.: Graph-based Ontology Analysis in the Linked Open Data. In: Proc. of 8th Int. Conf. on Semantic Systems, pp. 56–63. ACM Press (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holub, M., Proksa, O., Bieliková, M. (2015). Detecting Identical Entities in the Semantic Web Data. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-662-46078-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46077-1
Online ISBN: 978-3-662-46078-8
eBook Packages: Computer ScienceComputer Science (R0)