Skip to main content

GB-JER: A Graph-Based Model for Joint Entity Resolution

  • Conference paper
  • First Online:
Book cover Database Systems for Advanced Applications (DASFAA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9049))

Included in the following conference series:

Abstract

To resolve multiple classes of related entity representations jointly promotes accuracy of entity resolution. We propose a graph-based joint entity resolution model: GB-JER, who exploits a dynamic entity representation relationship graph. It contracts the neighborhood of the matched pair, where enrichment of semantics provides new evidences for subsequent entity resolution iteratively. Also GB-JER is an incremental approach. The experimental evaluation shows that GB-JER outperforms existing the state-of-the-art joint entity resolution approach in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. VLDB J. 18(1), 255–276 (2009). http://www.dx.doi.org/10.1007/s00778-008-0098-x

    Article  Google Scholar 

  2. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data 1(1), 1–36 (2007)

    Article  Google Scholar 

  3. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)

    Google Scholar 

  4. Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 257–258. ACM (2005)

    Google Scholar 

  5. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 85–96. ACM (2005)

    Google Scholar 

  6. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  7. Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newsl. 7(2), 3–12 (2005). http://www.doi.acm.org/10.1145/1117454.1117456

    Article  Google Scholar 

  8. Gruenheid, A., Dong, X.L., Srivastava, D.: Incremental record linkage. Proceedings of the VLDB Endowment 7(9) (2014)

    Google Scholar 

  9. Herschel, M., Naumann, F., Szott, S., Taubert, M.: Scalable iterative graph duplicate detection. IEEE Transactions on Knowledge and Data Engineering 24(11), 2094–2108 (2012)

    Article  Google Scholar 

  10. Kalashnikov, D.V., Mehrotra, S., Chen, Z.: Exploiting relationships for domain-independent data cleaning. In: SDM, pp. 262–273. SIAM (2005)

    Google Scholar 

  11. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  12. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 169–178. ACM (2000)

    Google Scholar 

  13. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press (1995)

    Google Scholar 

  14. Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Adaptive connection strength models for relationship-based entity resolution. Journal of Data and Information Quality (JDIQ) 4(2), 8 (2013). http://www.doi.acm.org/10.1145/2435221.2435224

    Google Scholar 

  15. Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. Proceedings of the VLDB Endowment 4(4), 208–218 (2011)

    Article  Google Scholar 

  16. Singla, P., Domingos, P.: Entity resolution with markov logic. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 572–582. IEEE (2006)

    Google Scholar 

  17. Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations 14(2), 20–28 (2012). http://www.doi.acm.org/10.1145/2481244.2481248

    Article  Google Scholar 

  18. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-K similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4(11), 992–1003 (2011). http://www.vldb.org/pvldb/vol4/p992-sun.pdf

    Google Scholar 

  19. Whang, S.E., Marmaros, D., Garcia-Molina, H.: Pay-as-you-go entity resolution. IEEE Transactions on Knowledge and Data Engineering 25(5), 1111–1124 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenchen Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sun, C., Shen, D., Kou, Y., Nie, T., Yu, G. (2015). GB-JER: A Graph-Based Model for Joint Entity Resolution. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18120-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18119-6

  • Online ISBN: 978-3-319-18120-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics