skip to main content
10.1145/3589334.3645397acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Matching Feature Separation Network for Domain Adaptation in Entity Matching

Published: 13 May 2024 Publication History

Abstract

Entity matching (EM) determines whether two records from different data sources refer to the same real-world entity. It is a fundamental task in knowledge graph construction and data integration. Currently, deep learning (DL) based EM methods have achieved state-of-the-art (SOTA) results. However, apply-ing DL-based EM methods often costs a lot of human efforts to label the data. To address this challenge, we propose a new do-main adaptation (DA) framework for EM called Matching Fea-ture Separation Network (MFSN). We implement DA by sepa-rating private and common matching features. Briefly, MFSN first uses three encoders to explicitly model the private and common matching features in both the source and target do-mains. Then, it transfers the knowledge learned from the source common matching features to the target domain. We also pro-pose an enhanced variant called Feature Representation and Separation Enhanced MFSN (MFSN-FRSE). Compared with MFSN, it has superior feature representation and separation capabilities. We evaluate the effectiveness of MFSN and MFSN-FRSE on twelve DA in EM tasks. The results show that our framework is approximately 7% higher in F1 score on average than the previous SOTA methods. Then, we verify the effec-tiveness of each module in MFSN and MFSN-FRSE by ablation study. Finally, we explore the optimal strategy of each module in MFSN and MFSN-FRSE through detailed tests.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 19--34.
[2]
Gerhard Weikum, Xin Luna Dong, Simon Razniewski, Fabian Suchanek, and others. 2021. Machine knowledge: Creation and curation of comprehensive knowledge bases. Foundations and Trends® in Databases 10, 2--4 (2021), 108--490.
[3]
Alieh Saeedi, Eric Peukert, and Erhard Rahm. 2020. Incremental multi-source entity resolution for knowledge graph completion. In The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31--June 4, 2020, Proceedings 17, Springer, 393--408.
[4]
Marvin Hofer, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm. 2023. Construction of knowledge graphs: State and challenges. arXiv preprint arXiv:2302.11509 (2023).
[5]
AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of data integra-tion. Elsevier.
[6]
Wenfei Fan, Hong Gao, Xibei Jia, Jianzhong Li, and Shuai Ma. 2011. Dynam-ic constraints for record matching. The VLDB Journal 20, 4 (August 2011), 495--520.
[7]
AnHai Doan, Pradap Konda, Paul Suganthan GC, Yash Govind, Derek Paulsen, Kaushik Chandrasekhar, Philip Martinkus, and Matthew Christie. 2020. Magel-lan: toward building ecosystems of entity matching solutions. Communications of the ACM 63, 8 (2020), 83--91.
[8]
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11, 11 (July 2018), 1454--1467.
[9]
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proc. VLDB Endow. 14, 1 (September 2020), 50--60.
[10]
Ursin Brunner and Kurt Stockinger. 2020. Entity matching with transformer architectures-a step forward in data integration. In 23rd International Confer-ence on Extending Database Technology, Copenhagen, 30 March-2 April 2020, OpenProceedings, 463--473.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understand-ing. arXiv preprint arXiv:1810.04805(2018).
[12]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[13]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[14]
Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, and Xiaoyong Du. 2022. Domain Adaptation for Deep Entity Resolution. In Proceedings of the 2022 International Conference on Manage-ment of Data (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 443--457.
[15]
Mohamed Trabelsi, Jeff Heflin, and Jin Cao. 2022. DAME: Domain Adapta-tion for Matching Entities. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM '22). Association for Computing Machinery, New York, NY, USA, 1016--1024.
[16]
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversar-ial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7167--7176.
[17]
Nishadi Kirielle, Peter Christen, and Thilina Ranbaduge. 2022. TransER: Ho-mogeneous Transfer Learning for Entity Resolution. In EDBT, 2--118.
[18]
Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. 2016. Domain separation networks. Advances in neural information processing systems 29, (2016).
[19]
Sun C, Xu L, Shen D, Nie T, and others. 2023. Domain Separation Network Based Entity Resolution Transferring Method. Journal of Hunan University 2 (2023), 86--94.
[20]
Bing Li, Yukai Miao, Yaoshu Wang, Yifang Sun, and Wei Wang. 2021. Im-proving the efficiency and effectiveness for bert-based entity resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, 13226--13233.
[21]
Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep domain confusion: Maximizing for domain invariance. arXiv pre-print arXiv:1412.3474 (2014).
[22]
Baochen Sun and Kate Saenko. 2016. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision--ECCV 2016 Workshops: Am-sterdam, The Netherlands, October 8--10 and 15--16, 2016, Proceedings, Part III 14, Springer, 443--450.
[23]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of machine learning research 17, 59 (2016), 1--35.
[24]
Ge Shi, Chong Feng, Lifu Huang, Boliang Zhang, Heng Ji, Lejian Liao, and He-Yan Huang. 2018. Genre separation network with adversarial training for cross-genre relation extraction. In Proceedings of the 2018 Conference on Em-pirical Methods in Natural Language Processing, 1018--1023.
[25]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. arXiv preprint arXiv:1704.05742 (2017).
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30. 5998--6008.
[27]
Ivan Montero, and Nikolaos Pappas. 2021. Sentence bottleneck autoencoders from transformer language models. arXiv preprint arXiv:2109.00055 (2021).
[28]
Anna Primpeli, Ralph Peeters, and Christian Bizer. 2019. The WDC Training Dataset and Gold Standard for Large-Scale Product Matching. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19). Associa-tion for Computing Machinery, New York, NY, USA, 381--386.
[29]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
[30]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (November 2020), 139--144.

Cited By

View all
  • (2024)Unsupervised Domain Adaptation for Entity Blocking Leveraging Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825234(159-164)Online publication date: 15-Dec-2024

Index Terms

  1. Matching Feature Separation Network for Domain Adaptation in Entity Matching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data integration
    2. domain adaptation
    3. entity matching
    4. knowledge graph construction
    5. matching feature separation network

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)142
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Unsupervised Domain Adaptation for Entity Blocking Leveraging Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825234(159-164)Online publication date: 15-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media