Abstract
Data integration on a human-manageable scale, by users without database expertise, is a more common activity than integration of large databases. Users often gather fine-grained data and organize it in an entity-centric way, developing tables of information regarding real-world objects, ideas, or people. Often, they do this by copying and pasting bits of data from e-mails, databases, or text files into a spreadsheet. During this process, users evolve their notions of entities and attributes. They combine sets of entities or attributes, split them again, update attribute values, and retract those updates. These functions are neither well supported by current tools, nor formally well understood. Our research seeks to capture and make explicit the data integration decisions made during these activities. In this paper, we formally define entity resolution and de-resolution, and show that these functions behave predictably and intuitively in the presence of attribute value updates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Archer, D., Delcambre, L.: Capturing Users’ Everyday, Implicit Information Integration Decisions. In: Conferences in Research and Practice in Information Technology, Auckland, NZ, vol. 83, pp. 133–138 (2007)
Murthy, S., Maier, D., Delcambre, L., Bowers, S.: Putting Integrated Information in Context: Superimposing Conceptual Models with SPARCE. In: Proceedings of the First Asia-Pacific Conference on Conceptual Modeling, Dunedin, NZ, pp. 71–80 (2004)
Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the Americal Statistical Association 64, 1183–1210 (1969)
Sayyadian, M., Shakery, A., Doan, A., Zhai, C.: Toward Entity Retrieval over Structured and Text Data. In: Proceedings of the first Workshop on the Integration of Information Retrieval and Databases, Sheffield, UK (2004)
Cai, Y., Dong, X., Halevy, A., Liu, J., Madhavan, J.: Personal Information Management with SEMEX. In: Proceedings of the ACM SIGMOD International Conference on Management of Data Baltimore (2005)
Benjelloun, O., Garcia-Molina, H., Su, Q., Widom, J.: Swoosh: a generic approach to entity resolution. Technical Report 2005-5. Stanford University, Palo Alto (2005)
Winkler, W.: Matching and record linkage. In: Cox, B. (ed.) Business Survey Methods. Wiley, New York (1995)
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Vancouver, Canada (2002)
Neiling, M., Jurk, S.: The object identification framework. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, pp. 33–40 (2003)
Mann, G., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of the Conference on Computational Natural Language Learning, Edmonton, Canada, pp. 33–40 (2003)
Hsuing, P., Moore, A., Neill, D., Schneider, J.: Alias detection in link data sets. In: Proceedings of the International Conference on Intelligence Analysis, McLean, VA (2005)
Malin, B.: Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, Newport Beach, CA, pp. 93–102 (2005)
Garcia-Molina, H.: Entity resolution: Overview and challenges. In: Proceedings of the International Conference on Conceptual Modeling, Shanghai, China, pp. 1–2 (2004)
Delcambre, L., Maier, D., Bowers, S., Weaver, M., Deng, L., Gorman, P., Ash, J., Lavelle, M., Lyman, J.: Bundles in Captivity: An Application of Superimposed Information. In: Proceedings of the 17th International Conference on Data Engineering, pp. 111–120 (2001)
Wyss, C., Robertson, E.: A Formal Characterization of PIVOT/UNPIVOT. In: Proceedings of CIKM 2005, Bremen, Germany, pp. 602–608 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Archer, D.W., Delcambre, L.M.L. (2008). Definition and Formalization of Entity Resolution Functions for Everyday Information Integration. In: Schewe, KD., Thalheim, B. (eds) Semantics in Data and Knowledge Bases. SDKB 2008. Lecture Notes in Computer Science, vol 4925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88594-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-88594-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88593-1
Online ISBN: 978-3-540-88594-8
eBook Packages: Computer ScienceComputer Science (R0)