ABSTRACT
Entity Matching (EM) is an important problem in data integration and cleaning. More recently, deep learning techniques, especially pre-trained language models, have been integrated into EM applications and achieved promising results. Unfortunately, the significant performance gain comes with the loss of explainability and transparency, deterring EM from the requirement of responsible data management. To address this issue, recent studies extended explainable AI techniques to explain black-box EM models. However, these solutions have the major drawbacks that (i) their explanations do not capture the unique semantics characteristics of the EM problem; and (ii) they fail to provide an objective method to quantitatively evaluate the provided explanations. In this paper, we propose Minun, a model-agnostic method to generate explanations for EM solutions. We utilize counterfactual examples generated from an EM customized search space as the explanations and develop two search algorithms to efficiently find such results. We also come up with a novel evaluation framework based on a student-teacher paradigm. The framework enables the evaluation of explanations of diverse formats by capturing the performance gain of a "student" model at simulating the target "teacher" model when explanations are given as side input. We conduct an extensive set of experiments on explaining state-of-the-art deep EM models on popular EM benchmark datasets. The results demonstrate that Minun significantly outperforms popular explainable AI methods such as LIME and SHAP on both explanation quality and scalability.
- A. B. Arrieta, N. D. Rodríguez, J. D. Ser, and et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion, 58:82--115, 2020.Google ScholarDigital Library
- A. Baraldi, F. D. Buono, M. Paganelli, and F. Guerra. Using landmarks for explaining entity matching models. In EDBT, pages 451--456, 2021.Google Scholar
- N. Barlaug. LEMON: explainable entity matching. CoRR, abs/2110.00516, 2021.Google Scholar
- U. Brunner and K. Stockinger. Entity matching with transformer architectures - A step forward in data integration. In A. Bonifati, Y. Zhou, M. A. V. Salles, A. Böhm, D. Olteanu, G. H. L. Fletcher, A. Khan, and B. Yang, editors, EDBT, pages 463--473, 2020.Google Scholar
- V. D. Cicco, D. Firmani, N. Koudas, P. Merialdo, and D. Srivastava. Interpreting deep learning models for entity resolution: an experience report using LIME. In aiDM@SIGMOD, pages 8:1--8:4, 2019.Google Scholar
- K. Clark, U. Khandelwal, O. Levy, and C. D. Manning. What does BERT look at? an analysis of bert's attention. In BlackboxNLP@ACL 2019, pages 276--286, 2019.Google ScholarCross Ref
- F. Doshi-Velez and B. Kim. A roadmap for a rigorous science of interpretability. CoRR, abs/1702.08608, 2017.Google Scholar
- A. Ebaid, S. Thirumuruganathan, W. G. Aref, A. K. Elmagarmid, and M. Ouzzani. EXPLAINER: entity resolution explanations. In ICDE, pages 2000--2003, 2019.Google ScholarCross Ref
- M. Ebraheem, S. Thirumuruganathan, S. R. Joty M. Ouzzani, and N. Tang. Distributed representations of tuples for entity resolution. PVLDB, 11(11):1454--1467, 2018.Google Scholar
- C. Fu, X. Han, J. He, and L. Sun. Hierarchical matching network for heterogeneous entity resolution. In C. Bessiere, editor, IJCAI, pages 3665--3671, 2020.Google Scholar
- C. Fu, X. Han, L. Sun, B. Chen, W. Zhang, S. Wu, and H. Kong. End-to-end multi-perspective matching for entity resolution. In S. Kraus, editor, IJCAI, pages 4961--4967, 2019.Google Scholar
- S. Galhotra, R. Pradhan, and B. Salimi. Explaining black-box algorithms using probabilistic contrastive counterfactuals. In SIGMOD, pages 577--590, 2021.Google ScholarDigital Library
- L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491--500, 2001.Google ScholarDigital Library
- R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1--93:42, 2019.Google ScholarDigital Library
- P. Hase and M. Bansal. Evaluating explainable AI: which algorithmic explanations help users predict model behavior? In ACL, pages 5540--5552, 2020.Google ScholarCross Ref
- S. Jain and B. C. Wallace. Attention is not explanation. In NAACL-HLT, pages 3543--3556, 2019.Google Scholar
- J. Kasai, K. Qian, S. Gurajada, Y. Li, and L. Popa. Low-resource deep entity resolution with transfer and active learning. In ACL, pages 5851--5861, 2019.Google ScholarCross Ref
- P. Konda, S. Das, P. S. G. C., A. Doan, and et al. Magellan: Toward building entity matching management systems. PVLDB, 9(12):1197--1208, 2016.Google ScholarDigital Library
- Y. Li, J. Li, Y. Suhara, A. Doan, and W. Tan. Deep entity matching with pre-trained language models. Proc. VLDB Endow., 14(1):50--60, 2020.Google ScholarDigital Library
- Y. Li, J. Li, Y. Suhara, J. Wang, W. Hirota, and W. Tan. Deep entity matching: Challenges and opportunities. ACM J. Data Inf. Qual., 13(1):1:1--1:17, 2021.Google ScholarDigital Library
- S. M. Lundberg and S. Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765--4774, 2017.Google Scholar
- Z. Miao, Y. Li, and X. Wang. Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond. In SIGMOD, pages 1303--1316, 2021.Google ScholarDigital Library
- C. Molnar. Interpretable machine learning. Lulu. com, 2020.Google Scholar
- S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, pages 19--34, 2018.Google ScholarDigital Library
- H. Nie, X. Han, B. He, L. Sun, B. Chen, W. Zhang, S. Wu, and H. Kong. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In CIKM, pages 629--638, 2019.Google ScholarDigital Library
- G. Papadakis, D. Skoutas, E. Thanos, and T. Palpanas. Blocking and filtering techniques for entity resolution: A survey. ACM Comput. Surv., 53(2):31:1--31:42, 2020.Google ScholarDigital Library
- R. Peeters and C. Bizer. Dual-objective fine-tuning of BERT for entity matching. Proc. VLDB Endow., 14(10):1913--1921, 2021.Google ScholarDigital Library
- R. Peeters, C. Bizer, and G. Glavas. Intermediate training of BERT for product matching. In F. Piai, D. Firmani, V. Crescenzi, A. D. Angelis, X. L. Dong, M. Mazzei, P. Merialdo, and D. Srivastava, editors, DI2KG@VLDB, 2020.Google Scholar
- D. Pruthi, B. Dhingra, L. B. Soares, M. Collins, Z. C. Lipton, G. Neubig, and W. W. Cohen. Evaluating explanations: How much do explanations from the teacher aid students? CoRR, abs/2012.00893, 2020.Google Scholar
- M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In ACM SIGKDD, pages 1135--1144, 2016.Google ScholarDigital Library
- M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations. In AAAI, pages 1527--1535, 2018.Google ScholarCross Ref
- M. Schleich, Z. Geng, Y. Zhang, and D. Suciu. Geco: Quality counterfactual explanations in real time. Proc. VLDB Endow., 14(9):1681--1693, 2021.Google ScholarDigital Library
- S. Serrano and N. A. Smith. Is attention interpretable? In ACL, pages 2931--2951, 2019.Google ScholarCross Ref
- J. Stoyanovich, B. Howe, and H. V. Jagadish. Responsible data management. PVLDB, 13(12):3474--3488, 2020.Google ScholarDigital Library
- S. Thirumuruganathan, M. Ouzzani, and N. Tang. Explaining entity resolution predictions: Where are we and what needs to be done? In HILDA@SIGMOD, pages 10:1--10:6, 2019.Google Scholar
- S. Thirumuruganathan, N. Tang, M. Ouzzani, and A. Doan. Data curation with deep learning. In EDBT, pages 277--286, 2020.Google Scholar
- B. van Aken, B. Winter, A. Löser, and F. A. Gers. How does BERT answer questions?: A layer-wise analysis of transformer representations. In CIKM, pages 1823--1832, 2019.Google ScholarDigital Library
- J. Wang, Y. Li, and W. Hirota. Machamp: A generalized entity matching benchmark. In CIKM, pages 4633--4642. ACM, 2021.Google ScholarDigital Library
- J. Wang, C. Lin, M. Li, and C. Zaniolo. Boosting approximate dictionary-based entity extraction with synonyms. Inf. Sci., 530:1--21, 2020.Google ScholarCross Ref
- J. Wang, C. Lin, and C. Zaniolo. Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In ICDE, pages 386--397. IEEE, 2019.Google ScholarCross Ref
- S. Wiegreffe and Y. Pinter. Attention is not not explanation. In EMNLP-IJCNLP, pages 11--20, 2019.Google ScholarCross Ref
- R. Wu, S. Chaba, S. Sawlani, X. Chu, and S. Thirumuruganathan. Zeroer: Entity resolution using zero labeled examples. In SIGMOD, pages 1149--1164, 2020.Google ScholarDigital Library
Recommendations
Complexity results for explanations in the structural-model approach
We analyze the computational complexity of Halpern and Pearl's (causal) explanations in the structural-model approach, which are based on their notions of weak and actual cause. In particular, we give a precise picture of the complexity of deciding ...
Complexity results for structure-based causality
We give a precise picture of the computational complexity of causal relationships in Pearl's structural models, where we focus on causality between variables, event causality, and probabilistic causality. As for causality between variables, we consider ...
PARAFAC-Based Blind Identification of Underdetermined Mixtures Using Gaussian Mixture Model
This paper presents a novel algorithm, named GMM-PARAFAC, for blind identification of underdetermined instantaneous linear mixtures. The GMM-PARAFAC algorithm uses Gaussian mixture model (GMM) to model non-Gaussianity of the independent sources. We show ...
Comments