Abstract
With the quick development of the semantic web technology, RDF data explosion has become a challenging problem. Since RDF data are always from different resources which may have overlap with each other, they could have duplicates. These duplicates may cause ambiguity and even error in reasoning. However, attentions are seldom paid to this problem. In this paper, we study the problem and give a solution, named K-radius subgraph comparison (KSC). The proposed method is based on RDF-Hierarchical Graph Model. KSC combines similar and comparison of context to detect duplicate in RDF data. Experiments on publication datasets show that the proposed method is efficient in duplicate detection of RDF data. KSC is simpler and less time-costs than other methods of graph comparison.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data: Principles and State of the Art. In: Proceedings of 17th International World Wide Web Conference (2008)
Bunke, H.: On the Weighted Mean of a Pair of Strings. Pattern Analysis & Applications (5), 23–30 (2002)
Han, H., Zha, H., Giles, C.: Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Tools & techniques: identifying names of people and places, pp. 334–343 (2005)
Han, H., Giles, L., Zha, H.: Two Supervised Learning Approaches for Name Disambiguation in Author Citations. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2004), pp. 269–305 (2004)
Hayes, J., Gutierrez, C.: Bipartite Graphs as Intermediate Model for RDF. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 47–61. Springer, Heidelberg (2004)
Huang, L., Jin, H., Yuan, P., Chu, F.: Duplicate Records Cleansing with Length Filtering and Dynamic Weighting. In: Proceedings of International Conference on Semantics, Knowledge and Grid 2008, December 4-6, pp. 95–102 (2008)
Kalashnikov, D., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM Transaction on Database Systems 31(2), 716–767 (2006)
Klyne, G., Carroll, J.: Resource description framework (RDF): Concepts and Abstract Syntax. W3C Recommendation. World Wide Web, February 10 (2004), http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
Minton, S.N., Nanjo, C., Knoblock, C.A., Michalowski, M., Michelson, M.: A Heterogeneous Field Matching Method for Record Linkage. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 314–321 (2005)
Monge, A.E., Elkan, C.: The field matching problem: algorithms and applications. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 267–270 (1996)
Morales, A., Serodio, M.: A Directed Hypergraph Model for RDF. In: Proc. of Knowledge Web PhD Symposium (2007)
Yin, X., Han, J., Hu, P.: Object Distinction: Distinguishing Entities with Identical Names. In: Proceedings of IEEE 23rd International Conference (ICDE 2007), pp. 1242–1246 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, H., Huang, L., Yuan, P. (2010). K-Radius Subgraph Comparison for RDF Data Cleansing. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)