Abstract
Entity Matching (EM) involves identifying and linking the given entities from various sources that pertain to identical real-world entities, serving as a fundamental element in data integration tasks. Such matching is assumed to play a pivotal role in enhancing the accuracy and reliability of downstream tasks in data analytics. Typically, the EM procedure comprises two essential stages: blocking and matching. This study focuses on the blocking phase, particularly the operations of candidate pair generation. Thus, the focal point of this study resides in the exploration of different techniques for generating candidate pairings from the sources during the blocking phase. The proposed work is evaluated by experiment on the benchmark datasets, which are DBLP-ACM and Amazon-GoogleProducts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azzalini, F., Jin, S., Renzi, M., Tanca, L.: Blocking techniques for entity linkage: a semantics-based approach. Data Sci. Eng. 6, 20–38 (2020). https://api.semanticscholar.org/CorpusID:228826450
Barlaug, N., Gulla, J.A.: Neural networks for entity matching: a survey. ACM Trans. Knowl. Discov. Data 15(3), 1–37 (2021)
Baxter, R.A., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: Knowledge Discovery and Data Mining (2003). https://api.semanticscholar.org/CorpusID:522380
Campbell, S.R., Resnick, D.M., Cox, C.S., Mirel, L.B.: Using supervised machine learning to identify efficient blocking schemes for record linkage. Stat. J. IAOS 37(2), 673–680 (2021). https://doi.org/10.3233/sji-200779
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Publishing Company, Berlin, Heidelberg (2012). Incorporated. https://doi.org/10.1007/978-3-642-31164-2
Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018). https://doi.org/10.14778/3236187.3236198
Elfeky, M., Verykios, V., Elmagarmid, A.: Tailor: a record linkage toolbox. In: Proceedings 18th International Conference on Data Engineering, pp. 17–28 (2002). https://doi.org/10.1109/ICDE.2002.994694
Huang, J., Hu, W., Bao, Z., Chen, Q., Qu, Y.: Deep entity matching with adversarial active learning. VLDB J. 32(1), 229–255 (2022)
Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84, 414–420 (1989). https://api.semanticscholar.org/CorpusID:121245380
Jin, D., Sisman, B., Wei, H., Dong, X.L., Koutra, D.: Deep transfer learning for multi-source entity linkage via domain adaptation. Proc. VLDB Endow. 15(3), 465–477 (2021). https://doi.org/10.14778/3494124.3494131
Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)
Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. PVLDB 3, 484–493 (2010). https://doi.org/10.14778/1920841.1920904
Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.C.: Effective entity matching with transformers. VLDB J. 1–21 (2023). https://doi.org/10.1007/s00778-023-00779-z
Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, Association for Computing Machinery, New York, NY, USA, SIGMOD ’18, pp. 19–34 (2018). https://doi.org/10.1145/3183713.3196926
O’Hare, K., Jurek-Loughrey, A., Campos, C.: A review of unsupervised and semi-supervised blocking methods for record linkage. In: Deka, P., Jurek-Loughrey, A. (eds.) Linking and Mining Heterogeneous and Multi-view Data, pp. 79–105. Unsupervised and Semi-Supervised Learning. Springer, Germany, Cham (2019). https://doi.org/10.1007/978-3-030-01872-6_4
Papadakis, G., Skoutas, D., Thanos, E., Palpanas, T.: A survey of blocking and filtering techniques for entity resolution (2020). 1905.06167
Russell, R.A.: Soundex: a phonetic algorithm for indexing names by sound. Commun. ACM 7(3), 152–153 (1918)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)
Steorts, R.C., Ventura, S.L., Sadinle, M., Fienberg, S.E.: A comparison of blocking methods for record linkage (2014). CoRR abs/1407.3191, http://arxiv.org/abs/1407.3191, 1407.3191
Acknowledgements
We are most thankful for the Faculty of Engineering, Chiang Mai University, for supporting us in this study. Additionally, we extend our sincere appreciation to the Chiang Mai University Presidential Scholarship for their financial support, which greatly contributed to the successful completion of this study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chantaranimi, K., Natwichai, J. (2024). Evaluation of Candidate Pair Generation Strategies in Entity Matching. In: Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing . 3PGCIC 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 189. Springer, Cham. https://doi.org/10.1007/978-3-031-46970-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-46970-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46969-5
Online ISBN: 978-3-031-46970-1
eBook Packages: EngineeringEngineering (R0)