ABSTRACT
Artificial intelligence and deep learning are becoming increasingly prevalent in contemporary software solutions. Explainable artificial intelligence (XAI) tools attempt to address the black box nature of the deep learning models and make them more understandable to humans. In this work, we apply three state-of-the-art XAI tools in a real-world case study. Our study focuses on predicting combined sewer overflow events for a municipal wastewater treatment organization. Through a data driven inquiry, we collect both qualitative information via stakeholder interviews and quantitative measures. These help us assess the predictive accuracy of the XAI tools, as well as the simplicity, soundness, and insightfulness of the produced explanations. Our results not only show the varying degrees that the XAI tools meet the requirements, but also highlight that domain experts can draw new insights from complex explanations that may differ from their previous expectations.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/Google Scholar
- Amina Adadi and Mohammed Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6 (2018), 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052 Google ScholarCross Ref
- Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald C. Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In Proceedings of the 41st IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’19). Montreal, Canada. 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042 Google ScholarDigital Library
- Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PloS one, 10, 7 (2015), e0130140. https://doi.org/10.1371/journal.pone.0130140 Google ScholarCross Ref
- José Manuel Benítez, Juan Luis Castro, and Ignacio Requena. 1997. Are Artificial Neural Networks Black Boxes? IEEE Transactions on Neural Networks, 8, 5 (1997), September, 1156–1164. https://doi.org/10.1109/72.623216 Google ScholarDigital Library
- Tanmay Bhowmik, Vander Alves, and Nan Niu. 2014. An Exploratory Case Study on Exploiting Aspect Orientation in Mobile Game Porting. In Integration of Reusable Systems, Thouraya Bouabana-Tebibel and Stuart H. Rubin (Eds.). Springer, 241–261. https://doi.org/10.1007/978-3-319-04717-1_11 Google ScholarCross Ref
- Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani B. Srivastava, Alun D. Preece, Simon Julier, Raghuveer M. Rao, Troy D. Kelley, Dave Braines, Murat Sensoy, Christopher J. Willis, and Prudhvi Gurram. 2017. Interpretability of Deep Learning Models: A Survey of Results. In Proceedings of the IEEE International Conference on Ubiquitous Intelligence and Computing (UIC’17). San Francisco, CA, USA. 1–6. https://doi.org/10.1109/UIC-ATC.2017.8397411 Google ScholarCross Ref
- Harshitha Challa, Nan Niu, and Reese Johnson. 2020. Faulty Requirements Made Valuable: On the Role of Data Quality in Deep Learning. In Proceedings of the 7th IEEE International Workshop on Artificial Intelligence for Requirements Engineering (AIRE’20). Zurich, Switzerland. 61–69. https://doi.org/10.1109/AIRE51212.2020.00016 Google ScholarCross Ref
- Larissa Chazette and Kurt Schneider. 2020. Explainability as a Non-Functional Requirement: Challenges and Recommendations. Requirements Engineering, 25, 4 (2020), December, 493–514. https://doi.org/10.1007/s00766-020-00333-1 Google ScholarDigital Library
- Lawrence Chung, Brian A. Nixon, Eric Yu, and John Mylopoulos. 1999. Non-Functional Requirements in Software Engineering. Springer.Google ScholarCross Ref
- Fabiano Dalpiaz and Nan Niu. 2020. Requirements Engineering in the Days of Artificial Intelligence. IEEE Software, 37, 4 (2020), July/August, 7–10. https://doi.org/10.1109/MS.2020.2986047 Google ScholarDigital Library
- Hoa Khanh Dam, Truyen Tran, and Aditya Ghose. 2018. Explainable Software Analytics. In Proceedings of the 40th ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results (ICSE’18). Gothenburg, Sweden. 53–56. https://doi.org/10.1145/3183399.3183424 Google ScholarDigital Library
- Department for Environment Food & Rural Affairs. 2015. Creating a River Thames fit for our future: An updated strategic and economic case for the Thames Tideway Tunnel. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/471847/thames-tideway-tunnel-strategic-economic-case.pdfGoogle Scholar
- DIVER. 2020. Web Application: Data Integration Visualization Exploration and Reporting Application, National Oceanic and Atmospheric Administration.. https://www.diver.orr.noaa.govGoogle Scholar
- Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. arxiv:1702.08608.Google Scholar
- Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining Explanations: An Overview of Interpretability of Machine Learning. In Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA’18). Turin, Italy. 80–89. https://doi.org/10.1109/DSAA.2018.00018 Google ScholarCross Ref
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS’10). Sardinia, Italy. 249–256. http://proceedings.mlr.press/v9/glorot10a.htmlGoogle Scholar
- Bryce Goodman and Seth R. Flaxman. 2017. European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. AI Magazine, 38, 3 (2017), 50–57. https://doi.org/10.1609/aimag.v38i3.2741 Google ScholarDigital Library
- Hemanth Gudaparthi, Reese Johnson, Harshitha Challa, and Nan Niu. 2020. Deep Learning for Smart Sewer Systems: Assessing Nonfunctional Requirements. In Proceedings of the 42nd IEEE/ACM International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS’20). Seoul, South Korea. 35–38. https://dl.acm.org/doi/10.1145/3377815.3381379Google Scholar
- Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2019. A Survey of Methods for Explaining Black Box Models. Comput. Surveys, 51, 5 (2019), January, 93:1–93:42. https://doi.org/10.1145/3236009 Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). Las Vegas, NV, USA. 770–778. https://doi.org/10.1109/CVPR.2016.90 Google ScholarCross Ref
- High-Level Expert Group on Artificial Intelligence, European Commission. 2019. Policy and Investment Recommendations for Trustworthy AI. https://digital-strategy.ec.europa.eu/en/policies/expert-group-aiGoogle Scholar
- Denis J. Hilton. 1990. Conversational Processes and Causal Explanation. Psychological Bulletin, 107, 1 (1990), January, 65–81. https://doi.org/10.1037/0033-2909.107.1.65 Google ScholarCross Ref
- Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, and John Grundy. 2021. Practitioners’ Perceptions of the Goals and Visual Explanations of Defect Prediction Models. arxiv:2102.12007.Google Scholar
- Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The Global Landscape of AI Ethics Guidelines. Nature Machine Intelligence, 1 (2019), September, 389–399. https://doi.org/10.1038/s42256-019-0088-2 Google ScholarCross Ref
- Charu Khatwani, Xiaoyu Jin, Nan Niu, Amy Koshoffer, Linda Newman, and Juha Savolainen. 2017. Advancing Viewpoint Merging in Requirements Engineering: A Theoretical Replication and Explanatory Study. Requirements Engineering, 22, 3 (2017), September, 317–338. https://doi.org/10.1007/s00766-017-0271-0 Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arxiv:1412.6980.Google Scholar
- Maximilian A. Köhl, Kevin Baum, Markus Langer, Daniel Oster, Timo Speith, and Dimitri Bohlender. 2019. Explainability as a Non-Functional Requirement. In Proceedings of the 27th IEEE International Requirements Engineering Conference (RE’19). Jeju Island, South Korea. 363–368. https://doi.org/10.1109/RE.2019.00046 Google ScholarCross Ref
- Tania Lombrozo. 2007. Simplicity and probability in causal explanation. Cognitive Psychology, 55, 3 (2007), November, 232–257. https://doi.org/10.1016/j.cogpsych.2006.09.006 Google ScholarCross Ref
- Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Long Beach, CA, USA. 4765–4774. http://papers.nips.cc/paper/6930-a-universal-analysis-of-large-scale-regularized-least-squares-solutionsGoogle ScholarDigital Library
- Tim Miller, Piers Howe, and Liz Sonenberg. 2017. Explainable AI: Beware of Inmates Running the Asylum. arxiv:1712.00547v2.Google Scholar
- Yao Ming, Huamin Qu, and Enrico Bertini. 2019. RuleMatrix: Visualizing and Understanding Classifiers with Rules. IEEE Transactions on Visualization and Computer Graphics, 25, 1 (2019), January, 342–352. https://doi.org/10.1109/TVCG.2018.2864812 Google ScholarDigital Library
- Nan Niu, Sjaak Brinkkemper, Xavier Franch, Jari Partanen, and Juha Savolainen. 2018. Requirements Engineering and Continuous Deployment. IEEE Software, 35, 2 (2018), March/April, 86–90. https://doi.org/10.1109/MS.2018.1661332 Google Scholar
- Nan Niu and Steve Easterbrook. 2007. So, You Think You Know Others’ Goals? A Repertory Grid Study. IEEE Software, 24, 2 (2007), March/April, 53–61. https://doi.org/10.1109/MS.2007.52 Google ScholarDigital Library
- Nan Niu, Amy Koshoffer, Linda Newman, Charu Khatwani, Chatura Samarasinghe, and Juha Savolainen. 2016. Advancing Repeated Research in Requirements Engineering: A Theoretical Replication of Viewpoint Merging. In Proceedings of the 24th IEEE International Requirements Engineering Conference (RE’16). Beijing, China. 186–195. https://doi.org/10.1109/RE.2016.46 Google ScholarCross Ref
- Nan Niu, Alejandra Yepez Lopez, and Jing-Ru C. Cheng. 2011. Using Soft Systems Methodology to Improve Requirements Practices: An Exploratory Case Study. IET Software, 5, 6 (2011), December, 487–495. https://doi.org/10.1049/iet-sen.2010.0096 Google ScholarCross Ref
- Nan Niu, Sandeep Reddivari, and Zhangji Chen. 2013. Keeping Requirements on Track via Visual Analytics. In Proceedings of the 21st IEEE International Requirements Engineering Conference (RE’13). Rio de Janeiro, Brazil. 205–214. https://doi.org/10.1109/RE.2013.6636720 Google ScholarCross Ref
- Nan Niu, Li Da Xu, Jing-Ru C. Cheng, and Zhendong Niu. 2014. Analysis of Architecturally Significant Requirements for Enterprise Systems. IEEE Systems Journal, 8, 3 (2014), September, 850–857. https://doi.org/10.1109/JSYST.2013.2249892 Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12 (2011), 2825–2830. https://jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdfGoogle ScholarDigital Library
- Sandeep Reddivari, Shirin Rad, Tanmay Bhowmik, Nisreen Cain, and Nan Niu. 2014. Visual Requirements Analytics: A Framework and Case Study. Requirements Engineering, 19, 3 (2014), September, 257–279. https://doi.org/10.1007/s00766-013-0194-3 Google ScholarDigital Library
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016. 2016. General Data Protection Regulation. https://eur-lex.europa.eu/eli/reg/2016/679/ojGoogle Scholar
- Alfréd Rényi. 1961. On Measures of Entropy and Information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics.Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). San Francisco, CA, USA. 1135–1144. https://doi.org/10.1145/2939672.2939778 Google ScholarDigital Library
- Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Important Features Through Propagating Activation Differences. arxiv:1704.02685.Google Scholar
- Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arxiv:1312.6034v2.Google Scholar
- Rini Van Solingen and Egon Berghout. 1999. The Goal/Question/Metric Method: A Practical Guide for Quality Improvement of Software Development. McGraw-Hill.Google Scholar
- Gianni Talamini, Di Shao, X. Su, X. Guo, and X. Ji. 2016. Combined Sewer Overflow In Shenzhen, China: The Case Study of Dasha River. WIT Transactions on Ecology and the Environment, 210 (2016), 785–796. https://doi.org/10.2495/SDP160661 Google ScholarCross Ref
- Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, and John Grundy. 2020. Explainable AI for Software Engineering. arxiv:2012.01614.Google Scholar
- Paul Thagard. 1989. Explanatory Coherence. Behavioral and Brain Sciences, 12, 3 (1989), September, 435–502. https://doi.org/10.1017/S0140525X00057046 Google ScholarCross Ref
- United States Environmental Protection Agency. 2004. Report to Congress: Impacts and control of CSOs and SSOs. https://www.epa.gov/npdes/2004-npdes-cso-report-congressGoogle Scholar
- Wentao Wang, Nan Niu, Hui Liu, and Zhendong Niu. 2018. Enhancing Automated Requirements Traceability by Resolving Polysemy. In Proceedings of the 26th IEEE International Requirements Engineering Conference (RE’18). Banff, Canada. 40–51. https://doi.org/10.1109/RE.2018.00-53 Google ScholarCross Ref
- Meredith Whittaker, Kate Crawford, Roel Dobbe, Genevieve Fried, Elizabeth Kaziunas, Varoon Mathur, Sarah Myers West, Rashida Richardson, Jason Schultz, and Oscar Schwartz. 2018. AI Now Report. https://ainowinstitute.org/AI_Now_2018_Report.pdfGoogle Scholar
- Christine T. Wolf and Jeanette Blomberg. 2019. Evaluating the Promise of Human-Algorithm Collaborations in Everyday Work Practices. Proceedings of the ACM on Human-Computer Interaction, 3, EICS (2019), June, 143:1–143:23. https://doi.org/10.1145/3359245 Google ScholarDigital Library
- Christine T. Wolf and Jeanette Blomberg. 2019. Explainability in Context: Lessons from an Intelligent System in the IT Services Domain. In Joint Proceedings of the ACM IUI 2019 Workshops (IUI’19). Los Angeles, CA, USA. http://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-17.pdfGoogle Scholar
- Robert K. Yin. 2008. Case Study Research: Design and Methods. Sage Publications.Google Scholar
- Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2019.2962027 Google ScholarDigital Library
Index Terms
- XAI tools in the public sector: a case study on predicting combined sewer overflows
Recommendations
The Tower of Babel in Explainable Artificial Intelligence (XAI)
Machine Learning and Knowledge ExtractionAbstractAs machine learning (ML) has emerged as the predominant technological paradigm for artificial intelligence (AI), complex black box models such as GPT-4 have gained widespread adoption. Concurrently, explainable AI (XAI) has risen in significance ...
Introduction to Explainable AI
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing SystemsAs Artificial Intelligence (AI) technologies are increasingly used to make important decisions and perform autonomous tasks, providing explanations that allow users to understand the AI has become a ubiquitous concern in human-AI interaction. Recently, a ...
The Methods and Approaches of Explainable Artificial Intelligence
Computational Science – ICCS 2021AbstractArtificial Intelligence has found innumerable applications, becoming ubiquitous in the contemporary society. From making unnoticeable, minor choices to determining people’s fates (the case of predictive policing). This fact raises serious concerns ...
Comments