Skip to main content

A Graphical Method for Reference Reconciliation

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Included in the following conference series:

  • 672 Accesses

Abstract

In many applications several references may refer to one real entity, the task of reference reconciliation is to group those references into several clusters so that each cluster associates with only one real entity. In this paper we propose a new method for reference reconciliation, that is, in addition to the traditional attribute values similarity, we employ the record-level relationships to compute the association similarity values of references in graphs, then we combine this kind of similarity with the traditional attribute values similarity and use the clustering algorithm to group the closest references.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Winkler, W.E.: The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census (1999)

    Google Scholar 

  2. Bilenko, M., Mooney, R.: Adaptive duplicate detection using learnable string similarity measures. In: SIGKDD (2003)

    Google Scholar 

  3. Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: Proc. of ACM SIGMOD Conf. (2003)

    Google Scholar 

  4. Yin, X., Han, J., Yu, P.S.: Object Distinction: Distinguishing Objects with Identical Names. In: ICDE 2007 (2007)

    Google Scholar 

  5. Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Exploiting relationships for object consolidation. In: ACM IQIS (2005)

    Google Scholar 

  6. Lee, M., Hsu, W., Kothari, V.: Cleaning the spurious links in data. IEEE Intelligent Systems (2004)

    Google Scholar 

  7. Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating Fuzzy Duplicates in Data Warehouses. In: Proceedings of 28th VLDB conference (2002)

    Google Scholar 

  8. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD (2005)

    Google Scholar 

  9. Kalashnikov, D.V., Mehrotra, S., Chen, Z.: Exploiting relationships for domain-independent data cleaning. In: SIAM SDM (2005)

    Google Scholar 

  10. Kalashnikov, D.V., Mehrotra, S., Chen, Z., Nuray-Turan, R., Ashish, N.: Disambiguation algorithm for people search on the web. In: ICDE 2007 (2007)

    Google Scholar 

  11. Kong, Q., Li, Q.: Object distinction based on decision tree. In: ITCS 2009 (2009)

    Google Scholar 

  12. Baxter, R., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: ACM KDD 2003 workshop on Data Cleaning, Record Linkage and Object Consolidation, Washington DC, pp. 25–27 (2003)

    Google Scholar 

  13. Bhattacharya, I., Getoor, L.: Relational clustering for multi-type entity resolution. In: MRDM Workshop (2005)

    Google Scholar 

  14. Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  15. Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD Workshop (2004)

    Google Scholar 

  16. Sahni, S.: Data Structures, Algorithms, and Application in C++. Silicon Press

    Google Scholar 

  17. Tan, P.-N., Steinbach, M.: Introduction to Data Mining. Addison Wesley Press, Reading

    Google Scholar 

  18. Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), San Jose, CA, May 1995, pp. 127–138 (1995)

    Google Scholar 

  19. Singla, P., Domingos, P.: Multi-relational record linkage. In: MRDM Workshop (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yongqing, Z., Qing, K., Guoqing, D. (2010). A Graphical Method for Reference Reconciliation. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics