Abstract
In many applications several references may refer to one real entity, the task of reference reconciliation is to group those references into several clusters so that each cluster associates with only one real entity. In this paper we propose a new method for reference reconciliation, that is, in addition to the traditional attribute values similarity, we employ the record-level relationships to compute the association similarity values of references in graphs, then we combine this kind of similarity with the traditional attribute values similarity and use the clustering algorithm to group the closest references.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Winkler, W.E.: The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census (1999)
Bilenko, M., Mooney, R.: Adaptive duplicate detection using learnable string similarity measures. In: SIGKDD (2003)
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proc. of ACM SIGMOD Conf. (2003)
Yin, X., Han, J., Yu, P.S.: Object Distinction: Distinguishing Objects with Identical Names. In: ICDE 2007 (2007)
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Exploiting relationships for object consolidation. In: ACM IQIS (2005)
Lee, M., Hsu, W., Kothari, V.: Cleaning the spurious links in data. IEEE Intelligent Systems (2004)
Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating Fuzzy Duplicates in Data Warehouses. In: Proceedings of 28th VLDB conference (2002)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD (2005)
Kalashnikov, D.V., Mehrotra, S., Chen, Z.: Exploiting relationships for domain-independent data cleaning. In: SIAM SDM (2005)
Kalashnikov, D.V., Mehrotra, S., Chen, Z., Nuray-Turan, R., Ashish, N.: Disambiguation algorithm for people search on the web. In: ICDE 2007 (2007)
Kong, Q., Li, Q.: Object distinction based on decision tree. In: ITCS 2009 (2009)
Baxter, R., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: ACM KDD 2003 workshop on Data Cleaning, Record Linkage and Object Consolidation, Washington DC, pp. 25–27 (2003)
Bhattacharya, I., Getoor, L.: Relational clustering for multi-type entity resolution. In: MRDM Workshop (2005)
Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, New York (1997)
Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD Workshop (2004)
Sahni, S.: Data Structures, Algorithms, and Application in C++. Silicon Press
Tan, P.-N., Steinbach, M.: Introduction to Data Mining. Addison Wesley Press, Reading
Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), San Jose, CA, May 1995, pp. 127–138 (1995)
Singla, P., Domingos, P.: Multi-relational record linkage. In: MRDM Workshop (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yongqing, Z., Qing, K., Guoqing, D. (2010). A Graphical Method for Reference Reconciliation. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-14589-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14588-9
Online ISBN: 978-3-642-14589-6
eBook Packages: Computer ScienceComputer Science (R0)