skip to main content
10.1145/3533028.3533304acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Minun: evaluating counterfactual explanations for entity matching

Published:12 June 2022Publication History

ABSTRACT

Entity Matching (EM) is an important problem in data integration and cleaning. More recently, deep learning techniques, especially pre-trained language models, have been integrated into EM applications and achieved promising results. Unfortunately, the significant performance gain comes with the loss of explainability and transparency, deterring EM from the requirement of responsible data management. To address this issue, recent studies extended explainable AI techniques to explain black-box EM models. However, these solutions have the major drawbacks that (i) their explanations do not capture the unique semantics characteristics of the EM problem; and (ii) they fail to provide an objective method to quantitatively evaluate the provided explanations. In this paper, we propose Minun, a model-agnostic method to generate explanations for EM solutions. We utilize counterfactual examples generated from an EM customized search space as the explanations and develop two search algorithms to efficiently find such results. We also come up with a novel evaluation framework based on a student-teacher paradigm. The framework enables the evaluation of explanations of diverse formats by capturing the performance gain of a "student" model at simulating the target "teacher" model when explanations are given as side input. We conduct an extensive set of experiments on explaining state-of-the-art deep EM models on popular EM benchmark datasets. The results demonstrate that Minun significantly outperforms popular explainable AI methods such as LIME and SHAP on both explanation quality and scalability.

References

  1. A. B. Arrieta, N. D. Rodríguez, J. D. Ser, and et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion, 58:82--115, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Baraldi, F. D. Buono, M. Paganelli, and F. Guerra. Using landmarks for explaining entity matching models. In EDBT, pages 451--456, 2021.Google ScholarGoogle Scholar
  3. N. Barlaug. LEMON: explainable entity matching. CoRR, abs/2110.00516, 2021.Google ScholarGoogle Scholar
  4. U. Brunner and K. Stockinger. Entity matching with transformer architectures - A step forward in data integration. In A. Bonifati, Y. Zhou, M. A. V. Salles, A. Böhm, D. Olteanu, G. H. L. Fletcher, A. Khan, and B. Yang, editors, EDBT, pages 463--473, 2020.Google ScholarGoogle Scholar
  5. V. D. Cicco, D. Firmani, N. Koudas, P. Merialdo, and D. Srivastava. Interpreting deep learning models for entity resolution: an experience report using LIME. In aiDM@SIGMOD, pages 8:1--8:4, 2019.Google ScholarGoogle Scholar
  6. K. Clark, U. Khandelwal, O. Levy, and C. D. Manning. What does BERT look at? an analysis of bert's attention. In BlackboxNLP@ACL 2019, pages 276--286, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  7. F. Doshi-Velez and B. Kim. A roadmap for a rigorous science of interpretability. CoRR, abs/1702.08608, 2017.Google ScholarGoogle Scholar
  8. A. Ebaid, S. Thirumuruganathan, W. G. Aref, A. K. Elmagarmid, and M. Ouzzani. EXPLAINER: entity resolution explanations. In ICDE, pages 2000--2003, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Ebraheem, S. Thirumuruganathan, S. R. Joty M. Ouzzani, and N. Tang. Distributed representations of tuples for entity resolution. PVLDB, 11(11):1454--1467, 2018.Google ScholarGoogle Scholar
  10. C. Fu, X. Han, J. He, and L. Sun. Hierarchical matching network for heterogeneous entity resolution. In C. Bessiere, editor, IJCAI, pages 3665--3671, 2020.Google ScholarGoogle Scholar
  11. C. Fu, X. Han, L. Sun, B. Chen, W. Zhang, S. Wu, and H. Kong. End-to-end multi-perspective matching for entity resolution. In S. Kraus, editor, IJCAI, pages 4961--4967, 2019.Google ScholarGoogle Scholar
  12. S. Galhotra, R. Pradhan, and B. Salimi. Explaining black-box algorithms using probabilistic contrastive counterfactuals. In SIGMOD, pages 577--590, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491--500, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1--93:42, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Hase and M. Bansal. Evaluating explainable AI: which algorithmic explanations help users predict model behavior? In ACL, pages 5540--5552, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Jain and B. C. Wallace. Attention is not explanation. In NAACL-HLT, pages 3543--3556, 2019.Google ScholarGoogle Scholar
  17. J. Kasai, K. Qian, S. Gurajada, Y. Li, and L. Popa. Low-resource deep entity resolution with transfer and active learning. In ACL, pages 5851--5861, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  18. P. Konda, S. Das, P. S. G. C., A. Doan, and et al. Magellan: Toward building entity matching management systems. PVLDB, 9(12):1197--1208, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Li, J. Li, Y. Suhara, A. Doan, and W. Tan. Deep entity matching with pre-trained language models. Proc. VLDB Endow., 14(1):50--60, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Li, J. Li, Y. Suhara, J. Wang, W. Hirota, and W. Tan. Deep entity matching: Challenges and opportunities. ACM J. Data Inf. Qual., 13(1):1:1--1:17, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. M. Lundberg and S. Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765--4774, 2017.Google ScholarGoogle Scholar
  22. Z. Miao, Y. Li, and X. Wang. Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond. In SIGMOD, pages 1303--1316, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Molnar. Interpretable machine learning. Lulu. com, 2020.Google ScholarGoogle Scholar
  24. S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, pages 19--34, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Nie, X. Han, B. He, L. Sun, B. Chen, W. Zhang, S. Wu, and H. Kong. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In CIKM, pages 629--638, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Papadakis, D. Skoutas, E. Thanos, and T. Palpanas. Blocking and filtering techniques for entity resolution: A survey. ACM Comput. Surv., 53(2):31:1--31:42, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Peeters and C. Bizer. Dual-objective fine-tuning of BERT for entity matching. Proc. VLDB Endow., 14(10):1913--1921, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Peeters, C. Bizer, and G. Glavas. Intermediate training of BERT for product matching. In F. Piai, D. Firmani, V. Crescenzi, A. D. Angelis, X. L. Dong, M. Mazzei, P. Merialdo, and D. Srivastava, editors, DI2KG@VLDB, 2020.Google ScholarGoogle Scholar
  29. D. Pruthi, B. Dhingra, L. B. Soares, M. Collins, Z. C. Lipton, G. Neubig, and W. W. Cohen. Evaluating explanations: How much do explanations from the teacher aid students? CoRR, abs/2012.00893, 2020.Google ScholarGoogle Scholar
  30. M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In ACM SIGKDD, pages 1135--1144, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations. In AAAI, pages 1527--1535, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  32. M. Schleich, Z. Geng, Y. Zhang, and D. Suciu. Geco: Quality counterfactual explanations in real time. Proc. VLDB Endow., 14(9):1681--1693, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Serrano and N. A. Smith. Is attention interpretable? In ACL, pages 2931--2951, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Stoyanovich, B. Howe, and H. V. Jagadish. Responsible data management. PVLDB, 13(12):3474--3488, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Thirumuruganathan, M. Ouzzani, and N. Tang. Explaining entity resolution predictions: Where are we and what needs to be done? In HILDA@SIGMOD, pages 10:1--10:6, 2019.Google ScholarGoogle Scholar
  36. S. Thirumuruganathan, N. Tang, M. Ouzzani, and A. Doan. Data curation with deep learning. In EDBT, pages 277--286, 2020.Google ScholarGoogle Scholar
  37. B. van Aken, B. Winter, A. Löser, and F. A. Gers. How does BERT answer questions?: A layer-wise analysis of transformer representations. In CIKM, pages 1823--1832, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Wang, Y. Li, and W. Hirota. Machamp: A generalized entity matching benchmark. In CIKM, pages 4633--4642. ACM, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Wang, C. Lin, M. Li, and C. Zaniolo. Boosting approximate dictionary-based entity extraction with synonyms. Inf. Sci., 530:1--21, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  40. J. Wang, C. Lin, and C. Zaniolo. Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In ICDE, pages 386--397. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  41. S. Wiegreffe and Y. Pinter. Attention is not not explanation. In EMNLP-IJCNLP, pages 11--20, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  42. R. Wu, S. Chaba, S. Sawlani, X. Chu, and S. Thirumuruganathan. Zeroer: Entity resolution using zero labeled examples. In SIGMOD, pages 1149--1164, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning
    June 2022
    63 pages
    ISBN:9781450393751
    DOI:10.1145/3533028

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 June 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    DEEM '22 Paper Acceptance Rate9of13submissions,69%Overall Acceptance Rate23of37submissions,62%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader