Skip to main content

K-Radius Subgraph Comparison for RDF Data Cleansing

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

Abstract

With the quick development of the semantic web technology, RDF data explosion has become a challenging problem. Since RDF data are always from different resources which may have overlap with each other, they could have duplicates. These duplicates may cause ambiguity and even error in reasoning. However, attentions are seldom paid to this problem. In this paper, we study the problem and give a solution, named K-radius subgraph comparison (KSC). The proposed method is based on RDF-Hierarchical Graph Model. KSC combines similar and comparison of context to detect duplicate in RDF data. Experiments on publication datasets show that the proposed method is efficient in duplicate detection of RDF data. KSC is simpler and less time-costs than other methods of graph comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data: Principles and State of the Art. In: Proceedings of 17th International World Wide Web Conference (2008)

    Google Scholar 

  2. Bunke, H.: On the Weighted Mean of a Pair of Strings. Pattern Analysis & Applications (5), 23–30 (2002)

    Google Scholar 

  3. Han, H., Zha, H., Giles, C.: Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Tools & techniques: identifying names of people and places, pp. 334–343 (2005)

    Google Scholar 

  4. Han, H., Giles, L., Zha, H.: Two Supervised Learning Approaches for Name Disambiguation in Author Citations. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2004), pp. 269–305 (2004)

    Google Scholar 

  5. Hayes, J., Gutierrez, C.: Bipartite Graphs as Intermediate Model for RDF. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 47–61. Springer, Heidelberg (2004)

    Google Scholar 

  6. Huang, L., Jin, H., Yuan, P., Chu, F.: Duplicate Records Cleansing with Length Filtering and Dynamic Weighting. In: Proceedings of International Conference on Semantics, Knowledge and Grid 2008, December 4-6, pp. 95–102 (2008)

    Google Scholar 

  7. Kalashnikov, D., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM Transaction on Database Systems 31(2), 716–767 (2006)

    Article  Google Scholar 

  8. Klyne, G., Carroll, J.: Resource description framework (RDF): Concepts and Abstract Syntax. W3C Recommendation. World Wide Web, February 10 (2004), http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/

  9. Minton, S.N., Nanjo, C., Knoblock, C.A., Michalowski, M., Michelson, M.: A Heterogeneous Field Matching Method for Record Linkage. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 314–321 (2005)

    Google Scholar 

  10. Monge, A.E., Elkan, C.: The field matching problem: algorithms and applications. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 267–270 (1996)

    Google Scholar 

  11. Morales, A., Serodio, M.: A Directed Hypergraph Model for RDF. In: Proc. of Knowledge Web PhD Symposium (2007)

    Google Scholar 

  12. Yin, X., Han, J., Hu, P.: Object Distinction: Distinguishing Entities with Identical Names. In: Proceedings of IEEE 23rd International Conference (ICDE 2007), pp. 1242–1246 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, H., Huang, L., Yuan, P. (2010). K-Radius Subgraph Comparison for RDF Data Cleansing. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics