Definition
A fundamental problem in data cleaning and integration (see Data Preparation) is dealing with uncertain and imprecise references to real-world entities. The goal of entity resolution is a take a collection of uncertain entity references (or references, in short) from a single data source or multiple data sources, discover the unique set of underlying entities, and map each reference to its corresponding entity. This typically involves two subproblems – identification of references with different attributes to the same entity, and disambiguation of references with identical attributes by assigning them to different entities.
Motivation and Background
Entity resolution is a common problem that comes up in different guises (and is given different names) in many computer science domains. Examples include computer...
This is a preview of subscription content, log in via an institution.
Recommended Reading
Bhattacharya, I., & Getoor, L. (2006). A latent dirichlet model for unsupervised entity resolution. In The SIAM international conference on data mining (SIAM-SDM), Bethesda, MD, USA.
Bhattacharya, I., & Getoor, L. (2007). Collective entity resolution in relational data. ACM transactions on knowledge discovery from data, 1(1), 5.
Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2003), Washington, DC.
Chaudhuri, S., Ganjam, K., Ganti, V., & Motwani, R. (2003). Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (pp. 313–324). San Diego, CA.
Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003). A comparison of string distance metrics for name-matching tasks. In Proceedings of the IJCAI-2003 workshop on information integration on the web (pp. 73–78). Acapulco, Mexico.
Dong,X.,Halevy,A.,&Madhavan,J.(2005).Referencereconciliationincomplex information spaces. In The ACM international conference on management of data (SIGMOD), Baltimore, MD, USA.
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.
Gravano, L., Ipeirotis, P., Koudas, N., & Srivastava, D. (2003). Text joins for data cleansing and integration in an rdbms. In 19th IEEE international conference on data engineering.
Hernández, M. A., & Stolfo, S. J. (1995). The merge/purge problem for large databases. In Proceedings of the 1995 ACM SIGMOD international conference on management of data (SIGMOD-95) (pp. 127–138). San Jose, CA.
Kalashnikov, D. V., Mehrotra, S., & Chen, Z. (2005). Exploiting relationships for domain-independent data cleaning. In SIAM international conference on data mining (SIAM SDM), April 21–23 2005, Newport Beach, CA, USA.
Li, X., Morie, P., & Roth, D. (2005). Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine. Special issue on semantic integration, 26(1).
McCallum, A., & Wellner, B. (2004). Conditional models of identity uncertainty with application to noun coreference. In NIPS, Vancouver, BC.
Menestrina, D., Benjelloun, O., & Garcia-Molina, H. (2006). Generic entity resolution with data confidences. In First Int’l VLDB workshop on clean databases, Seoul, Korea.
Monge, A. E., & Elkan, C. P. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records. In Proceedings of the SIGMOD 1997 workshop on research issues on data mining and knowledge discovery (pp. 23–29). Tuscon, AZ.
Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. In Advances in neural information processing systems 15. Cambridge, MA: MIT Press.
Singla, P., & Domingos, P. (2004). Multi-relational record linkage. In Proceedings of 3rd workshop on multi-relational data mining at ACM SI GKDD, Seattle, WA.
Winkler, W. E. (2002). Methods for record linkage and Bayesian networks. Technical Report, Statistical Research Division, U.S. Census Bureau, Washington, DC.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Bhattacharya, I., Getoor, L. (2011). Entity Resolution. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_254
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_254
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering