skip to main content
10.1145/3643491.3660283acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article
Open access

Evaluating Human-Centered AI Explanations: Introduction of an XAI Evaluation Framework for Fact-Checking

Published: 10 June 2024 Publication History

Abstract

The rapidly increasing amount of online information and the advent of Generative Artificial Intelligence (GenAI) make the manual verification of information impractical. Consequently, AI systems are deployed to detect disinformation and deepfakes. Prior studies have indicated that combining AI and human capabilities yields enhanced performance in detecting disinformation. Furthermore, the European Union (EU) AI Act mandates human supervision for AI applications in areas impacting essential human rights, like freedom of speech, necessitating that AI systems be transparent and provide adequate explanations to ensure comprehensibility. Extensive research has been conducted on incorporating explainability (XAI) attributes to augment AI transparency, yet these often miss a human-centric assessment. The effectiveness of such explanations also varies with the user’s prior knowledge and personal attributes. Therefore, we developed a framework for validating XAI features for the collaborative human-AI fact-checking task. The framework allows the testing of XAI features with objective and subjective evaluation dimensions and follows human-centric design principles when displaying information about the AI system to the users. The framework was tested in a crowdsourcing experiment with 433 participants, including 406 crowdworkers and 27 journalists for the collaborative disinformation detection task. The tested XAI features increase the AI system’s perceived usefulness, understandability, and trust. With this publication, the XAI evaluation framework is made open source.

References

[1]
Tariq Alhindi, Savvas Petridis, and Smaranda Muresan. 2018. Where is Your Evidence: Improving Fact-checking by Justification Modeling. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics, Brussels, Belgium, 85–90. https://doi.org/10.18653/v1/W18-5513
[2]
Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera. 2023. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Information Fusion 99 (2023), 101805.
[3]
Jennifer Allen, Baird Howland, Markus Mobius, David Rothschild, and Duncan J Watts. 2020. Evaluating the fake news problem at the scale of the information ecosystem. Science Advances 6, 14 (2020), eaay3539.
[4]
Wissam Antoun, Fady Baly, Rim Achour, Amir Hussein, and Hazem Hajj. 2020. State of the art models for fake news detection tasks. In 2020 IEEE international conference on informatics, IoT, and enabling technologies (ICIoT). IEEE, 519–524.
[5]
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, and Isabelle Augenstein. 2020. Generating Fact Checking Explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7352–7364. https://doi.org/10.18653/v1/2020.acl-main.656
[6]
Vimala Balakrishnan, Ng Wei Zhen, Soo Mun Chong, Gan Joo Han, and Tan Jiat Lee. 2022. Infodemic and fake news–A comprehensive overview of its global magnitude during the COVID-19 pandemic in 2021: A scoping review. International Journal of Disaster Risk Reduction (2022), 103144.
[7]
Devsoft Baltic. 2023. Integrate Third-Party React Components / SurveyJS Documentation. https://surveyjs.io/form-library/documentation/customize-question-types/third-party-component-integration-react. Accessed: 18.07.2023.
[8]
Devsoft Baltic. 2023. Sliders - Material Design. https://m2.material.io/ components/sliders. Accessed: 19.07.2023.
[9]
Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2018. Predicting Factuality of Reporting and Bias of News Media Sources. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3528–3539. https://doi.org/10.18653/v1/D18-1389
[10]
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 648–657.
[11]
Alessandro Bondielli and Francesco Marcelloni. 2019. A survey on fake news and rumour detection techniques. Information Sciences 497 (2019), 38–55.
[12]
Tim Brooks, Bill Peebles, Connor Homes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Wing Yin Ng, Ricky Wang, and Aditya Ramesh. 2024. Video generation models as world simulators. (2024). https://openai.com/research/video-generation-models-as-world-simulators
[13]
Sven Coppers, Jan Van den Bergh, Kris Luyten, Karin Coninx, Iulianna Van der Lek-Ciudin, Tom Vanallemeersch, and Vincent Vandeghinste. 2018. Intellingo: An intelligible translation environment. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
[14]
Omar Darwish, Yahya Tashtoush, Majdi Maabreh, Rana Al-essa, Ruba Aln’uman, Ammar Alqublan, Munther Abualkibash, and Mahmoud Elkhodr. 2023. Identifying Fake News in the Russian-Ukrainian Conflict Using Machine Learning. In Advanced Information Networking and Applications: Proceedings of the 37th International Conference on Advanced Information Networking and Applications (AINA-2023), Volume 3. Springer, 546–557.
[15]
Lijun Feng, Martin Jansche, Matt Huenerfauth, and Noémie Elhadad. 2010. A Comparison of Features for Automatic Readability Assessment. In Coling 2010: Posters. Coling 2010 Organizing Committee, Beijing, China, 276–284. https://aclanthology.org/C10-2032
[16]
Nicole Gillespie, Steve Lockey, and Caitlin Curtis. 2021. Trust in artificial intelligence: A five country study. (2021).
[17]
Michael Gleicher. 2016. A framework for considering comprehensibility in modeling. Big data 4, 2 (2016), 75–88.
[18]
Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A Survey on Automated Fact-Checking. Transactions of the Association for Computational Linguistics 10 (02 2022), 178–206. https://doi.org/10.1162/tacl_a_00454
[19]
Anna Hedström, Leander Weber, Daniel Krakowczyk, Dilyara Bareeva, Franz Motzkus, Wojciech Samek, Sebastian Lapuschkin, and Marina Marina M.-C. Höhne. 2023. Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond. Journal of Machine Learning Research 24, 34 (2023), 1–11. http://jmlr.org/papers/v24/22-0142.html
[20]
Dudi ISKANDAR, Indah SURYAWATI, Geri SURATNO, Liliyana LILIYANA, Muhtadi MUHTADI, and Ngimadudin NGIMADUDIN. 2023. Public Communication Model In Combating Hoaxes And Fake News In Ahead Of The 2024 General Election. International Journal of Environmental, Sustainability, and Social Science 4, 5 (2023), 1505–1518.
[21]
Razieh Khamsehashari, Vera Schmitt, Tim Polzehl, Salar Mohtaj, and Sebastian Moeller. 2023. How Risky is Multimodal Fake News Detection? A Review of Cross-Modal Learning Approaches under EU AI Act Constrains. In Proc. 2023 ISCA Symposium on Security and Privacy in Speech Communication. 47–51.
[22]
Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! Criticism for Interpretability. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.). Vol. 29. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2016/file/5680522b8e2bb01943234bce7bf84534-Paper.pdf
[23]
Neema Kotonya and Francesca Toni. 2020. Explainable Automated Fact-Checking for Public Health Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7740–7754. https://doi.org/10.18653/v1/2020.emnlp-main.623
[24]
Allison J. Lazard, Ivan Watkins, Michael S. Mackert, Bo Xie, Keri K. Stephens, and Heidi Shalev. 2016. Design simplicity influences patient portal use: the role of aesthetic evaluations for technology acceptance. J. Am. Med. Inform. Assoc. 23, e1 (April 2016), e157–e161. https://doi.org/10.1093/jamia/ocv174
[25]
Piyawat Lertvittayakumjorn and Francesca Toni. 2019. Human-grounded Evaluations of Explanation Methods for Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5195–5205. https://doi.org/10.18653/v1/D19-1523
[26]
Yang Liu and Yi-Fang Wu. 2018. Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[27]
Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, 2023. Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions. arXiv preprint arXiv:2310.19775 (2023).
[28]
Pedro Lopes, Eduardo Silva, Cristiana Braga, Tiago Oliveira, and Luís Rosado. 2022. XAI Systems Evaluation: A Review of Human and Computer-Centred Methods. Applied Sciences 12, 19 (2022), 9423.
[29]
Giovanni Da San Martino, Stefano Cresci, Alberto Barrón-Cedeño, Seunghak Yu, Roberto Di Pietro, and Preslav Nakov. 2020. A Survey on Computational Propaganda Detection. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4826–4832. https://doi.org/10.24963/ijcai.2020/672 Survey track.
[30]
Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee. 2021. HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. Proceedings of the AAAI Conference on Artificial Intelligence 35, 17 (May 2021), 14867–14875. https://ojs.aaai.org/index.php/AAAI/article/view/17745
[31]
D Harrison Mcknight, Michelle Carter, Jason Bennett Thatcher, and Paul F Clay. 2011. Trust in a specific technology: An investigation of its components and measures. ACM Transactions on management information systems (TMIS) 2, 2 (2011), 1–25.
[32]
Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019), 1–38.
[33]
Sina Mohseni, Fan Yang, Shiva Pentyala, Mengnan Du, Yi Liu, Nic Lupfer, Xia Hu, Shuiwang Ji, and Eric Ragan. 2021. Machine Learning Explanations to Prevent Overtrust in Fake News Detection. Proceedings of the International AAAI Conference on Web and Social Media 15, 1 (May 2021), 421–431. https://doi.org/10.1609/icwsm.v15i1.18072
[34]
Sina Mohseni, Niloofar Zarei, and Eric D Ragan. 2018. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. arXiv preprint arXiv:1811.11839 (2018).
[35]
Sina Mohseni, Niloofar Zarei, and Eric D Ragan. 2021. A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 11, 3-4 (2021), 1–45.
[36]
Linda Monsees. 2023. Information disorder, fake news and the future of democracy. Globalizations 20, 1 (2023), 153–168.
[37]
Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, and Ludwig Schmidt. 2024. Improving multimodal datasets with image captioning. Advances in Neural Information Processing Systems 36 (2024).
[38]
Hao Nie, Xianpei Han, Ben He, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 629–638.
[39]
Mahsan Nourani, Samia Kabir, Sina Mohseni, and Eric D Ragan. 2019. The effects of meaningful and meaningless explanations on trust and perceived system accuracy in intelligent systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 97–105.
[40]
High Level Expert Group on Fake News and Online Disinformation. 2018. Report to the European Commission on A Multi-Dimensional Approach to Disinformation. (2018). https://ec.europa.eu/digital-single-market/en/news/final-report-high-level-expert-group-fake-news-and-online-disinformation
[41]
OpenAI and Josh Achiam et al.2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
[42]
Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, and Preslav Nakov. 2023. Fact-Checking Complex Claims with Program-Guided Reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 6981–7004. https://doi.org/10.18653/v1/2023.acl-long.386
[43]
Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. Automatic Detection of Fake News. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 3391–3401. https://aclanthology.org/C18-1287
[44]
Tim Polzehl, Vera Schmitt, Nils Feldhus, Joachim Meyer, and Sebastian Möller. 2023. Fighting Disinformation: Overview of Recent AI-Based Collaborative Human-Computer Interaction for Intelligent Decision Support Systems. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - HUCAPP,. INSTICC, SciTePress, 267–278. https://doi.org/10.5220/0011788900003417
[45]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. arxiv:2212.04356 [eess.AS]
[46]
Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2931–2937. https://doi.org/10.18653/v1/D17-1317
[47]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[48]
Arkadiy Saakyan, Tuhin Chakrabarty, and Smaranda Muresan. 2021. COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 2116–2129. https://doi.org/10.18653/v1/2021.acl-long.165
[49]
Vera Schmitt, Veronika Solopova, Vinicius Woloszyn, and Jessica de Jesus de Pinho Pinhal. 2021. Implications of the New Regulation Proposed by the European Commission on Automatic Content Moderation. In Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication. 47–51.
[50]
Vera Schmitt, Luis-Felipe Villa-Arenas, NIls Feldhus, Joachim Meyer, Robert Spang, and Sebastian Möller. 2024. The Role of Explainability in Collaborative Human-AI Disinformation Detection. In The 2023 ACM Conference on Fairness, Accountability, and Transparency (FACCT ’24). https://doi.org/10.1145/3630106.3659031
[51]
Konstantin Schulz, Jens Rauenbusch, Jan Fillies, Lisa Rutenburg, Dimitrios Karvelas, and Georg Rehm. 2022. User Experience Design for Automatic Credibility Assessment of News Content About COVID-19. In HCI International 2022-Late Breaking Papers. Interaction in New Media, Learning and Games: 24th International Conference on Human-Computer Interaction, HCII 2022, Virtual Event, June 26–July 1, 2022, Proceedings. Springer, 142–165.
[52]
Tal Schuster, Adam Fisch, and Regina Barzilay. 2021. Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 624–643. https://doi.org/10.18653/v1/2021.naacl-main.52
[53]
Khurram Shahzad 2021. Measuring Information Literacy (IL) Skills among University Research Scholars: A Case Study of GC University Lahore. (2021).
[54]
Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery Math 3 data mining. 395–405.
[55]
Timo Speith and Markus Langer. 2023. A new perspective on evaluation methods for explainable artificial intelligence (xai). In 2023 IEEE 31st International Requirements Engineering Conference Workshops (REW). IEEE, 325–331.
[56]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 809–819. https://doi.org/10.18653/v1/N18-1074
[57]
Marisa Tschopp, Nicolas Scharowski, and Philipp Wintersberger. 2021. Do Humans Trust AI or Its Developers? Exploring Benefits of Differentiating Trustees Within Trust in AI Frameworks. (09 2021). https://doi.org/10.13140/RG.2.2.25692.10885
[58]
Joshua A Tucker, Andrew Guess, Pablo Barberá, Cristian Vaccari, Alexandra Siegel, Sergey Sanovich, Denis Stukal, and Brendan Nyhan. 2018. Social media, political polarization, and political disinformation: A review of the scientific literature. Political polarization, and political disinformation: a review of the scientific literature (March 19, 2018) (2018).
[59]
Vercel. 2023. Rendering: Static Site Generation (SSG)/ Next.js. https://nextjs.org/docs/pages/building-your-application/rendering/static-site-generation. Accessed: 18.07.2023.
[60]
Giulia Vilone and Luca Longo. 2021. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76 (2021), 89–106.
[61]
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7534–7550. https://doi.org/10.18653/v1/2020.emnlp-main.609
[62]
Jonas Wanner, Lukas-Valentin Herm, Kai Heinrich, and Christian Janiesch. 2022. The effect of transparency and trust on intelligent system acceptance: Evidence from a user-based study. Electronic Markets (2022), 1–24.
[63]
A. Wathan and S. Schoger. 2019. Refactoring UI. Adam Wathan & Steve Schoger. https://books.google.de/books?id=XjxDzAEACAAJ
[64]
Vinicius Woloszyn, Eduardo G Cortes, Rafael Amantea, Vera Schmitt, Dante AC Barone, and Sebastian Möller. 2021. Towards a novel benchmark for automatic generation of claimreview markup. In Proceedings of the 13th ACM Web Science Conference 2021. 29–35.
[65]
Jianlong Zhou, Amir H Gandomi, Fang Chen, and Andreas Holzinger. 2021. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 10, 5 (2021), 593.

Cited By

View all
  • (2024)MAD '24 Workshop: Multimedia AI against DisinformationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3660000(1339-1341)Online publication date: 30-May-2024
  • (2024)The Role of Explainability in Collaborative Human-AI Disinformation DetectionProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659031(2157-2174)Online publication date: 3-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MAD '24: Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation
June 2024
107 pages
ISBN:9798400705526
DOI:10.1145/3643491
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2024

Check for updates

Author Tags

  1. Human-centered eXplanations
  2. blind trust in AI systems
  3. objective and subjective evaluation of eXplanations

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • BMBF

Conference

ICMR '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)842
  • Downloads (Last 6 weeks)149
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MAD '24 Workshop: Multimedia AI against DisinformationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3660000(1339-1341)Online publication date: 30-May-2024
  • (2024)The Role of Explainability in Collaborative Human-AI Disinformation DetectionProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659031(2157-2174)Online publication date: 3-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media