Skip to main content

Evaluation of Candidate Pair Generation Strategies in Entity Matching

  • Conference paper
  • First Online:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2023)

Abstract

Entity Matching (EM) involves identifying and linking the given entities from various sources that pertain to identical real-world entities, serving as a fundamental element in data integration tasks. Such matching is assumed to play a pivotal role in enhancing the accuracy and reliability of downstream tasks in data analytics. Typically, the EM procedure comprises two essential stages: blocking and matching. This study focuses on the blocking phase, particularly the operations of candidate pair generation. Thus, the focal point of this study resides in the exploration of different techniques for generating candidate pairings from the sources during the blocking phase. The proposed work is evaluated by experiment on the benchmark datasets, which are DBLP-ACM and Amazon-GoogleProducts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Azzalini, F., Jin, S., Renzi, M., Tanca, L.: Blocking techniques for entity linkage: a semantics-based approach. Data Sci. Eng. 6, 20–38 (2020). https://api.semanticscholar.org/CorpusID:228826450

  2. Barlaug, N., Gulla, J.A.: Neural networks for entity matching: a survey. ACM Trans. Knowl. Discov. Data 15(3), 1–37 (2021)

    Google Scholar 

  3. Baxter, R.A., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: Knowledge Discovery and Data Mining (2003). https://api.semanticscholar.org/CorpusID:522380

  4. Campbell, S.R., Resnick, D.M., Cox, C.S., Mirel, L.B.: Using supervised machine learning to identify efficient blocking schemes for record linkage. Stat. J. IAOS 37(2), 673–680 (2021). https://doi.org/10.3233/sji-200779

    Article  Google Scholar 

  5. Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Publishing Company, Berlin, Heidelberg (2012). Incorporated. https://doi.org/10.1007/978-3-642-31164-2

  6. Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018). https://doi.org/10.14778/3236187.3236198

  7. Elfeky, M., Verykios, V., Elmagarmid, A.: Tailor: a record linkage toolbox. In: Proceedings 18th International Conference on Data Engineering, pp. 17–28 (2002). https://doi.org/10.1109/ICDE.2002.994694

  8. Huang, J., Hu, W., Bao, Z., Chen, Q., Qu, Y.: Deep entity matching with adversarial active learning. VLDB J. 32(1), 229–255 (2022)

    Google Scholar 

  9. Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84, 414–420 (1989). https://api.semanticscholar.org/CorpusID:121245380

  10. Jin, D., Sisman, B., Wei, H., Dong, X.L., Koutra, D.: Deep transfer learning for multi-source entity linkage via domain adaptation. Proc. VLDB Endow. 15(3), 465–477 (2021). https://doi.org/10.14778/3494124.3494131

  11. Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)

    Google Scholar 

  12. Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. PVLDB 3, 484–493 (2010). https://doi.org/10.14778/1920841.1920904

  13. Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.C.: Effective entity matching with transformers. VLDB J. 1–21 (2023). https://doi.org/10.1007/s00778-023-00779-z

  14. Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, Association for Computing Machinery, New York, NY, USA, SIGMOD ’18, pp. 19–34 (2018). https://doi.org/10.1145/3183713.3196926

  15. O’Hare, K., Jurek-Loughrey, A., Campos, C.: A review of unsupervised and semi-supervised blocking methods for record linkage. In: Deka, P., Jurek-Loughrey, A. (eds.) Linking and Mining Heterogeneous and Multi-view Data, pp. 79–105. Unsupervised and Semi-Supervised Learning. Springer, Germany, Cham (2019). https://doi.org/10.1007/978-3-030-01872-6_4

  16. Papadakis, G., Skoutas, D., Thanos, E., Palpanas, T.: A survey of blocking and filtering techniques for entity resolution (2020). 1905.06167

    Google Scholar 

  17. Russell, R.A.: Soundex: a phonetic algorithm for indexing names by sound. Commun. ACM 7(3), 152–153 (1918)

    Google Scholar 

  18. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)

    MATH  Google Scholar 

  19. Steorts, R.C., Ventura, S.L., Sadinle, M., Fienberg, S.E.: A comparison of blocking methods for record linkage (2014). CoRR abs/1407.3191, http://arxiv.org/abs/1407.3191, 1407.3191

Download references

Acknowledgements

We are most thankful for the Faculty of Engineering, Chiang Mai University, for supporting us in this study. Additionally, we extend our sincere appreciation to the Chiang Mai University Presidential Scholarship for their financial support, which greatly contributed to the successful completion of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kittayaporn Chantaranimi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chantaranimi, K., Natwichai, J. (2024). Evaluation of Candidate Pair Generation Strategies in Entity Matching. In: Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing . 3PGCIC 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 189. Springer, Cham. https://doi.org/10.1007/978-3-031-46970-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46970-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46969-5

  • Online ISBN: 978-3-031-46970-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics