Abstract
To resolve multiple classes of related entity representations jointly promotes accuracy of entity resolution. We propose a graph-based joint entity resolution model: GB-JER, who exploits a dynamic entity representation relationship graph. It contracts the neighborhood of the matched pair, where enrichment of semantics provides new evidences for subsequent entity resolution iteratively. Also GB-JER is an incremental approach. The experimental evaluation shows that GB-JER outperforms existing the state-of-the-art joint entity resolution approach in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. VLDB J. 18(1), 255–276 (2009). http://www.dx.doi.org/10.1007/s00778-008-0098-x
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data 1(1), 1–36 (2007)
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)
Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 257–258. ACM (2005)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 85–96. ACM (2005)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newsl. 7(2), 3–12 (2005). http://www.doi.acm.org/10.1145/1117454.1117456
Gruenheid, A., Dong, X.L., Srivastava, D.: Incremental record linkage. Proceedings of the VLDB Endowment 7(9) (2014)
Herschel, M., Naumann, F., Szott, S., Taubert, M.: Scalable iterative graph duplicate detection. IEEE Transactions on Knowledge and Data Engineering 24(11), 2094–2108 (2012)
Kalashnikov, D.V., Mehrotra, S., Chen, Z.: Exploiting relationships for domain-independent data cleaning. In: SDM, pp. 262–273. SIAM (2005)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 169–178. ACM (2000)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press (1995)
Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Adaptive connection strength models for relationship-based entity resolution. Journal of Data and Information Quality (JDIQ) 4(2), 8 (2013). http://www.doi.acm.org/10.1145/2435221.2435224
Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. Proceedings of the VLDB Endowment 4(4), 208–218 (2011)
Singla, P., Domingos, P.: Entity resolution with markov logic. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 572–582. IEEE (2006)
Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations 14(2), 20–28 (2012). http://www.doi.acm.org/10.1145/2481244.2481248
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-K similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4(11), 992–1003 (2011). http://www.vldb.org/pvldb/vol4/p992-sun.pdf
Whang, S.E., Marmaros, D., Garcia-Molina, H.: Pay-as-you-go entity resolution. IEEE Transactions on Knowledge and Data Engineering 25(5), 1111–1124 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sun, C., Shen, D., Kou, Y., Nie, T., Yu, G. (2015). GB-JER: A Graph-Based Model for Joint Entity Resolution. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-18120-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)