Skip to main content

An Alignment Comparator for Entity Resolution with Multi-valued Attributes

  • Conference paper
Nature-Inspired Computation and Machine Learning (MICAI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8857))

Included in the following conference series:

Abstract

Entity matching is a problem that concerns many data management processes. If we consider matching between entities represented by RDF individuals we might find attributes values lists with variable-length for some properties, which will lead us to the problem of comparing multi-valued attributes, e.g. comparing author names lists for determining publication matching. This matching technique would be more complex than comparing fixed-length records, but less complex than comparing XML documents. Instead of comparing a single string, representing the concatenation of these values, each value of one vector should be compared against all values of the other vector. We propose a set of heuristics to address the alignment and comparison process of multi-valued attributes and evaluate them in the context of bibliographic databases. Our first results show that it is possible to reduce the comparisons amount and provide an aggregated similarity metric that outperforms the average similarity of cross product comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)

    Article  Google Scholar 

  2. Burkard, R., Dell’Amico, M., Martello, S.: Assignment Problems. Siam, Philadelphia (2009)

    Book  MATH  Google Scholar 

  3. Cohen, W.W., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of the ACM Workshop on Data Cleaning, Record Linkage and Object Identification (2003)

    Google Scholar 

  4. DCMI: Dublin Core Ontology (2012), http://dublincore.org/documents/dces/

  5. Dorneles, C.F., Gonçalves, R., Santos Mello, R.: Approximate data instance matching: a survey. Knowledge and Information Systems 27(1), 1–21 (2010)

    Article  Google Scholar 

  6. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  7. Google: Google Refine Project (2012), http://code.google.com/p/google-refine/

  8. Grannis, S.J., Overhage, J.M., McDonald, C.: Real world performance of approximate string comparators for use in patient matching. Studies in Health Technology and Informatics 107(pt.1), 43–47 (2004)

    Google Scholar 

  9. Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the Results of Approximate Match Operations. In: Proceedings of The Thirtieth International Conference on Very Large Data Bases, pp. 636–647 (2004)

    Google Scholar 

  10. Köpcke, H., Rahm, E.: Frameworks for entity matching: A comparison. Data & Knowledge Engineering 69(2), 197–210 (2010)

    Article  Google Scholar 

  11. Köpcke, H., Thor, A., Rahm, E.: Comparative evaluation of entity resolution approaches with FEVER. In: Proceedings of 35th Intl. Conference on Very Large Databases (VLDB) (2009)

    Google Scholar 

  12. Morris, T., Huynh, D.: FingerPrint Method (2010), https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth

  13. Porter, E.H., Winkler, W.E.: Approximate String Comparison and its Effect on an Advanced Record Linkage System. Tech. rep (1997)

    Google Scholar 

  14. Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: Proceedings of the Workshop on Privacy and Security Aspects of Data Mining at the Int. Conf. on Data Mining, pp. 40–46 (2004)

    Google Scholar 

  15. Sure, Y., Bloehdorn, S., Haase, P., Hartmann, J., Oberle, D.: The swrc ontology - semantic web for research communities. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS (LNAI), vol. 3808, pp. 218–231. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Talburt, J.R.: Entity resolution and information quality. Elsevier (2011)

    Google Scholar 

  17. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk–a link discovery framework for the web of data. In: Proceedings of the 2nd Workshop on Linked Data on the Web (2009)

    Google Scholar 

  18. Winkler, W.E.: Advanced Methods For Record Linkage. Section on Survey Research Methods (American Statistical Association) (1994)

    Google Scholar 

  19. Winkler, W.E.: Overview of record linkage and current research directions. In: Proceedings of Bureau of the Census. Citeseer (2006)

    Google Scholar 

  20. Yancey, W.E.: Evaluating string comparator performance for record linkage. Statistical Research Division Research Report (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Mazzucchi-Augel, P.N., Ceballos, H.G. (2014). An Alignment Comparator for Entity Resolution with Multi-valued Attributes. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Nature-Inspired Computation and Machine Learning. MICAI 2014. Lecture Notes in Computer Science(), vol 8857. Springer, Cham. https://doi.org/10.1007/978-3-319-13650-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13650-9_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13649-3

  • Online ISBN: 978-3-319-13650-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics