Skip to main content

Definition and Formalization of Entity Resolution Functions for Everyday Information Integration

  • Conference paper
Book cover Semantics in Data and Knowledge Bases (SDKB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4925))

Included in the following conference series:

Abstract

Data integration on a human-manageable scale, by users without database expertise, is a more common activity than integration of large databases. Users often gather fine-grained data and organize it in an entity-centric way, developing tables of information regarding real-world objects, ideas, or people. Often, they do this by copying and pasting bits of data from e-mails, databases, or text files into a spreadsheet. During this process, users evolve their notions of entities and attributes. They combine sets of entities or attributes, split them again, update attribute values, and retract those updates. These functions are neither well supported by current tools, nor formally well understood. Our research seeks to capture and make explicit the data integration decisions made during these activities. In this paper, we formally define entity resolution and de-resolution, and show that these functions behave predictably and intuitively in the presence of attribute value updates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Archer, D., Delcambre, L.: Capturing Users’ Everyday, Implicit Information Integration Decisions. In: Conferences in Research and Practice in Information Technology, Auckland, NZ, vol. 83, pp. 133–138 (2007)

    Google Scholar 

  2. Murthy, S., Maier, D., Delcambre, L., Bowers, S.: Putting Integrated Information in Context: Superimposing Conceptual Models with SPARCE. In: Proceedings of the First Asia-Pacific Conference on Conceptual Modeling, Dunedin, NZ, pp. 71–80 (2004)

    Google Scholar 

  3. Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the Americal Statistical Association 64, 1183–1210 (1969)

    Article  MATH  Google Scholar 

  4. Sayyadian, M., Shakery, A., Doan, A., Zhai, C.: Toward Entity Retrieval over Structured and Text Data. In: Proceedings of the first Workshop on the Integration of Information Retrieval and Databases, Sheffield, UK (2004)

    Google Scholar 

  5. Cai, Y., Dong, X., Halevy, A., Liu, J., Madhavan, J.: Personal Information Management with SEMEX. In: Proceedings of the ACM SIGMOD International Conference on Management of Data Baltimore (2005)

    Google Scholar 

  6. Benjelloun, O., Garcia-Molina, H., Su, Q., Widom, J.: Swoosh: a generic approach to entity resolution. Technical Report 2005-5. Stanford University, Palo Alto (2005)

    Google Scholar 

  7. Winkler, W.: Matching and record linkage. In: Cox, B. (ed.) Business Survey Methods. Wiley, New York (1995)

    Google Scholar 

  8. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Vancouver, Canada (2002)

    Google Scholar 

  9. Neiling, M., Jurk, S.: The object identification framework. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC, pp. 33–40 (2003)

    Google Scholar 

  10. Mann, G., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of the Conference on Computational Natural Language Learning, Edmonton, Canada, pp. 33–40 (2003)

    Google Scholar 

  11. Hsuing, P., Moore, A., Neill, D., Schneider, J.: Alias detection in link data sets. In: Proceedings of the International Conference on Intelligence Analysis, McLean, VA (2005)

    Google Scholar 

  12. Malin, B.: Unsupervised name disambiguation via social network similarity. In: Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, Newport Beach, CA, pp. 93–102 (2005)

    Google Scholar 

  13. Garcia-Molina, H.: Entity resolution: Overview and challenges. In: Proceedings of the International Conference on Conceptual Modeling, Shanghai, China, pp. 1–2 (2004)

    Google Scholar 

  14. Delcambre, L., Maier, D., Bowers, S., Weaver, M., Deng, L., Gorman, P., Ash, J., Lavelle, M., Lyman, J.: Bundles in Captivity: An Application of Superimposed Information. In: Proceedings of the 17th International Conference on Data Engineering, pp. 111–120 (2001)

    Google Scholar 

  15. Wyss, C., Robertson, E.: A Formal Characterization of PIVOT/UNPIVOT. In: Proceedings of CIKM 2005, Bremen, Germany, pp. 602–608 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Archer, D.W., Delcambre, L.M.L. (2008). Definition and Formalization of Entity Resolution Functions for Everyday Information Integration. In: Schewe, KD., Thalheim, B. (eds) Semantics in Data and Knowledge Bases. SDKB 2008. Lecture Notes in Computer Science, vol 4925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88594-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88594-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88593-1

  • Online ISBN: 978-3-540-88594-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics