research-article

Matching Feature Separation Network for Domain Adaptation in Entity Matching

Authors:

Tiezheng NieAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 1975 - 1985

https://doi.org/10.1145/3589334.3645397

Published: 13 May 2024 Publication History

Abstract

Entity matching (EM) determines whether two records from different data sources refer to the same real-world entity. It is a fundamental task in knowledge graph construction and data integration. Currently, deep learning (DL) based EM methods have achieved state-of-the-art (SOTA) results. However, apply-ing DL-based EM methods often costs a lot of human efforts to label the data. To address this challenge, we propose a new do-main adaptation (DA) framework for EM called Matching Fea-ture Separation Network (MFSN). We implement DA by sepa-rating private and common matching features. Briefly, MFSN first uses three encoders to explicitly model the private and common matching features in both the source and target do-mains. Then, it transfers the knowledge learned from the source common matching features to the target domain. We also pro-pose an enhanced variant called Feature Representation and Separation Enhanced MFSN (MFSN-FRSE). Compared with MFSN, it has superior feature representation and separation capabilities. We evaluate the effectiveness of MFSN and MFSN-FRSE on twelve DA in EM tasks. The results show that our framework is approximately 7% higher in F1 score on average than the previous SOTA methods. Then, we verify the effec-tiveness of each module in MFSN and MFSN-FRSE by ablation study. Finally, we explore the optimal strategy of each module in MFSN and MFSN-FRSE through detailed tests.

Supplemental Material

MP4 File

Supplemental video

Download
8.08 MB

References

[1]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 19--34.

Digital Library

[2]

Gerhard Weikum, Xin Luna Dong, Simon Razniewski, Fabian Suchanek, and others. 2021. Machine knowledge: Creation and curation of comprehensive knowledge bases. Foundations and Trends® in Databases 10, 2--4 (2021), 108--490.

[3]

Alieh Saeedi, Eric Peukert, and Erhard Rahm. 2020. Incremental multi-source entity resolution for knowledge graph completion. In The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31--June 4, 2020, Proceedings 17, Springer, 393--408.

Digital Library

[4]

Marvin Hofer, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm. 2023. Construction of knowledge graphs: State and challenges. arXiv preprint arXiv:2302.11509 (2023).

[5]

AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of data integra-tion. Elsevier.

[6]

Wenfei Fan, Hong Gao, Xibei Jia, Jianzhong Li, and Shuai Ma. 2011. Dynam-ic constraints for record matching. The VLDB Journal 20, 4 (August 2011), 495--520.

Digital Library

[7]

AnHai Doan, Pradap Konda, Paul Suganthan GC, Yash Govind, Derek Paulsen, Kaushik Chandrasekhar, Philip Martinkus, and Matthew Christie. 2020. Magel-lan: toward building ecosystems of entity matching solutions. Communications of the ACM 63, 8 (2020), 83--91.

Digital Library

[8]

Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11, 11 (July 2018), 1454--1467.

Digital Library

[9]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proc. VLDB Endow. 14, 1 (September 2020), 50--60.

Digital Library

[10]

Ursin Brunner and Kurt Stockinger. 2020. Entity matching with transformer architectures-a step forward in data integration. In 23rd International Confer-ence on Extending Database Technology, Copenhagen, 30 March-2 April 2020, OpenProceedings, 463--473.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understand-ing. arXiv preprint arXiv:1810.04805(2018).

[12]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[13]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[14]

Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, and Xiaoyong Du. 2022. Domain Adaptation for Deep Entity Resolution. In Proceedings of the 2022 International Conference on Manage-ment of Data (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 443--457.

Digital Library

[15]

Mohamed Trabelsi, Jeff Heflin, and Jin Cao. 2022. DAME: Domain Adapta-tion for Matching Entities. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM '22). Association for Computing Machinery, New York, NY, USA, 1016--1024.

Digital Library

[16]

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversar-ial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7167--7176.

[17]

Nishadi Kirielle, Peter Christen, and Thilina Ranbaduge. 2022. TransER: Ho-mogeneous Transfer Learning for Entity Resolution. In EDBT, 2--118.

[18]

Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. 2016. Domain separation networks. Advances in neural information processing systems 29, (2016).

[19]

Sun C, Xu L, Shen D, Nie T, and others. 2023. Domain Separation Network Based Entity Resolution Transferring Method. Journal of Hunan University 2 (2023), 86--94.

[20]

Bing Li, Yukai Miao, Yaoshu Wang, Yifang Sun, and Wei Wang. 2021. Im-proving the efficiency and effectiveness for bert-based entity resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, 13226--13233.

[21]

Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv pre-print arXiv:1412.3474 (2014).

[22]

Baochen Sun and Kate Saenko. 2016. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision--ECCV 2016 Workshops: Am-sterdam, The Netherlands, October 8--10 and 15--16, 2016, Proceedings, Part III 14, Springer, 443--450.

[23]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of machine learning research 17, 59 (2016), 1--35.

Digital Library

[24]

Ge Shi, Chong Feng, Lifu Huang, Boliang Zhang, Heng Ji, Lejian Liao, and He-Yan Huang. 2018. Genre separation network with adversarial training for cross-genre relation extraction. In Proceedings of the 2018 Conference on Em-pirical Methods in Natural Language Processing, 1018--1023.

[25]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. arXiv preprint arXiv:1704.05742 (2017).

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30. 5998--6008.

[27]

Ivan Montero, and Nikolaos Pappas. 2021. Sentence bottleneck autoencoders from transformer language models. arXiv preprint arXiv:2109.00055 (2021).

[28]

Anna Primpeli, Ralph Peeters, and Christian Bizer. 2019. The WDC Training Dataset and Gold Standard for Large-Scale Product Matching. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Associa-tion for Computing Machinery, New York, NY, USA, 381--386.

[29]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).

[30]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (November 2020), 139--144.

Digital Library

Cited By

Wang YYan M(2024)Unsupervised Domain Adaptation for Entity Blocking Leveraging Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825234(159-164)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825234

Index Terms

Matching Feature Separation Network for Domain Adaptation in Entity Matching
1. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution

Recommendations

Deep Entity Matching: Challenges and Opportunities
On the Horizon, On the Horizon and Experience Papers

Entity matching refers to the task of determining whether two different representations refer to the same real-world entity. It continues to be a prevalent problem for many organizations where data resides in different sources and duplicates the need to ...
Efficient entity matching using materialized lists

Entity matching (EM) is the task of identifying records that refer to the same entity from different sources. EM is widely used in real-world applications such as data integration and data cleaning, but the naive method of EM leads to exhaustive pair-...
Tailoring Entity Matching for Industrial Settings
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Entity matching has received significant attention from the research community over many years. Despite some limited success, most state-of-the-art methods see no widespread usage in industry.

In this paper, we present the author's PhD research, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
142
Total Downloads

Downloads (Last 12 months)142
Downloads (Last 6 weeks)8

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YYan M(2024)Unsupervised Domain Adaptation for Entity Blocking Leveraging Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825234(159-164)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825234

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten