Skip to main content

Detecting Identical Entities in the Semantic Web Data

  • Conference paper
SOFSEM 2015: Theory and Practice of Computer Science (SOFSEM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8939))

Abstract

Large amount of entities published by various sources inevitably introduces inaccuracies, mainly duplicated information. These can even be found within a single dataset. In this paper we propose a method for automatic discovery of identity relationship between two entities (also known as instance matching) in a dataset represented as a graph (e.g. in the Linked Data Cloud). Our method can be used for cleaning existing datasets from duplicates, validating of existing identity relationships between entities within a dataset, or for connecting different datasets using the owl:sameAs relationship. Our method is based on the analysis of sub-graphs formed by entities, their properties and existing relationships between them. It can learn a common similarity threshold for particular dataset, so it is adaptable to its different properties. We evaluated our method by conducting several experiments on data from the domains of public administration and digital libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araujo, S., Tran, D.T., de Vries, A.P., Schwabe, D.: SERIMI: Class-based Disambiguation for Effective Instance Matching over Heterogeneous Web Data. In: Proc. of 15th Int. Workshop on the Web and Databases, WebDB 2012, pp. 25–30 (2012)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Aumueller, D., Do, H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proc. of 2005 ACM SIGMOD Int. Conf. on Management of Data, pp. 906–908. ACM Press (2005)

    Google Scholar 

  4. Holub, M., Móro, R., Ševcech, J., Lipták, M., Bieliková, M.: Annota: Towards Enriching Scientific Publications with Semantics and User Annotations. D-Lib Magazine 20(11/12) (2014)

    Google Scholar 

  5. Ferrara, A., Nikolov, A., Scharffe, F.: Data Linking for the Semantic Web. Int. Journal on Semantic Web and Information Systems 7(3), 46–76 (2011)

    Article  Google Scholar 

  6. Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Harth, A., Hose, K., Schenkel, R.: Database Techniques for Linked Data Management. In: Proc. of 2012 ACM SIGMOD Int. Conf. on Management of Data, pp. 597–600. ACM Press (2012)

    Google Scholar 

  8. Lehmann, J., Schüppel, J., Auer, S.: Discovering Unknown Connections - the DBpedia Relationship Finder. In: Proc. of 1st Conf. on Social Semantic Web, CSSW, vol. 113, pp. 99–110 (2007)

    Google Scholar 

  9. Leitão, L., Calado, P., Herschel, M.: Efficient and Effective Duplicate Detection in Hierarchical Data. IEEE Trans. on Knowledge and Data Engineering 25(5), 1028–1041 (2013)

    Article  Google Scholar 

  10. Ley, M.: The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In: Proc. of 18th Int. Conf. on Data Engineering, pp. 117–128. IEEE CS (2002)

    Google Scholar 

  12. Ngomo, A.N., Auer, S.: LIMES: A Time-efficient Approach for Large-scale Link Discovery on the Web of Data. In: Proc. of 22nd Int. Joint Conf. on Artificial Intelligence, pp. 2312–2317. AAAI Press (2011)

    Google Scholar 

  13. Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised Learning of Link Discovery Configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Shvaiko, P., Euzenat, J.: A Survey of Schema-based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Shvaiko, P., Euzenat, J.: Ontology Matching: State of the Art and Future Challenges. IEEE Trans. on Knowledge and Data Engineering 25(1), 158–176 (2013)

    Article  Google Scholar 

  16. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proc. of 16th Int. Conf. on World Wide Web, pp. 697–706. ACM Press (2007)

    Google Scholar 

  17. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - A Link Discovery Framework for the Web of Data. In: Proc. of the Linked Data on the Web Workshop (LDOW2009), CEUR Workshop Proceedings, vol. 538 (2009)

    Google Scholar 

  18. Weikum, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In: Proc. of 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 65–76. ACM Press (2010)

    Google Scholar 

  19. Zaïane, O.R., Chen, J., Goebel, R.: Mining Research Communities in Bibliographical Data. In: Zhang, H., et al. (eds.) WebKDD 2007. LNCS, vol. 5439, pp. 59–76. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Zhao, L., Ichsie, R.: Graph-based Ontology Analysis in the Linked Open Data. In: Proc. of 8th Int. Conf. on Semantic Systems, pp. 56–63. ACM Press (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Holub, M., Proksa, O., Bieliková, M. (2015). Detecting Identical Entities in the Semantic Web Data. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46078-8_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46077-1

  • Online ISBN: 978-3-662-46078-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics