skip to main content
10.1145/3563657.3596046acmconferencesArticle/Chapter ViewAbstractPublication PagesdisConference Proceedingsconference-collections
research-article
Open access

From Discovery to Adoption: Understanding the ML Practitioners’ Interpretability Journey

Published: 10 July 2023 Publication History

Abstract

Models are interpretable when machine learning (ML) practitioners can readily understand the reasoning behind their predictions. Ironically, little is known about the ML practitioners’ experience of discovering and adopting novel interpretability techniques in production settings. In a qualitative study with 18 practitioners at a large technology company working with text data, we found that despite varied tasks, practitioners experienced nearly identical challenges related to interpretability methods in model analysis workflows. These stem from problem formulation, the social nature of interpretability investigations, and non-standard practices in cross-functional organizational contexts. A follow-up examination of early-stage design probes with seven practitioners suggests that self-reported experts are “perpetual intermediates”, who can benefit from regular, responsive, and in-situ education about interpretability methods across workflows, regardless of prior experience with models, analysis tools, or interpretability techniques. From these findings, we emphasize the need for multi-stage support for learning of interpretability methods for real-world NLP applications.

References

[1]
Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3173574.3174156
[2]
Mark S. Ackerman, Juri Dachtera, Volkmar Pipek, and Volker Wulf. 2013. Sharing Knowledge and Expertise: The CSCW View of Knowledge Management. Computer Supported Cooperative Work (CSCW) 22, 4-6 (Aug. 2013), 531–573. https://doi.org/10.1007/s10606-013-9192-8
[3]
Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6 (2018), 52138–52160.
[4]
David Alvarez-Melis, Harmanpreet Kaur, Hal Daumé, Hanna Wallach, and Jennifer Wortman Vaughan. 2021. From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence. https://doi.org/10.48550/ARXIV.2104.13299
[5]
Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. ModelTracker: Redesigning Performance Analysis Tools for Machine Learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 337–346. https://doi.org/10.1145/2702123.2702509
[6]
Oscar D. Andrade, Nathaniel Bean, and David G. Novick. 2009. The Macro-Structure of Use of Help. In Proceedings of the 27th ACM International Conference on Design of Communication (Bloomington, Indiana, USA) (SIGDOC ’09). Association for Computing Machinery, New York, NY, USA, 143–150. https://doi.org/10.1145/1621995.1622022
[7]
Christopher Antonik. 2015. How Do Professional Analysts Judge Rigor: The Effect of Indicators of Analytic Rigor on Critiques of Analytic Product and Process. Ph. D. Dissertation. The Ohio State University.
[8]
Narges Ashtari, Parsa Alamzadeh, Gayatri Ganapathy, and Parmit Chilana. 2022. PONI: A Personalized Onboarding Interface for Getting Inspiration and Learning About AR/VR Creation. In Nordic Human-Computer Interaction Conference (Aarhus, Denmark) (NordiCHI ’22). Association for Computing Machinery, New York, NY, USA, Article 32, 14 pages. https://doi.org/10.1145/3546155.3546642
[9]
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. The Journal of Machine Learning Research 11 (2010), 1803–1831.
[10]
Jasmijn Bastings, Sebastian Ebert, Polina Zablotskaia, Anders Sandholm, and Katja Filippova. 2021. "Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification. https://doi.org/10.48550/ARXIV.2111.07367
[11]
Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, and Dominik Moritz. 2022. Symphony: Composing Interactive Interfaces for Machine Learning. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 210, 14 pages. https://doi.org/10.1145/3491102.3502102
[12]
Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. https://doi.org/10.48550/ARXIV.1810.01943
[13]
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
[14]
John M. Carroll. 1990. The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill. MIT Press, Cambridge, MA, USA.
[15]
John M. Carroll and Mary Beth Rosson. 1987. Paradox of the Active User. In Interfacing Thought: Cognitive Aspects of Human-Computer Interaction. MIT Press, Cambridge, MA, USA, 80–111.
[16]
Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (2019), 832.
[17]
Parmit K. Chilana, Amy J. Ko, and Jacob O. Wobbrock. 2012. LemonAid: Selection-Based Crowdsourced Contextual Help for Web Applications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1549–1558. https://doi.org/10.1145/2207676.2208620
[18]
Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. 2016. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. https://doi.org/10.1038/srep27755
[19]
Alan Cooper, Robert Reimann, David Cronin, and Christopher Noessel. 2014. About Face: The Essentials of Interaction Design.
[20]
Juliet M. Corbin and Anselm Strauss. 1990. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology 13, 1 (1990), 3–21. https://doi.org/10.1007/bf00988593
[21]
Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A Survey of the State of Explainable AI for Natural Language Processing., 447–459 pages. https://aclanthology.org/2020.aacl-main.46
[22]
Arun Das and Paul Rad. 2020. Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. https://doi.org/10.48550/ARXIV.2006.11371
[23]
Misha Denil, Alban Demiraj, and Nando de Freitas. 2014. Extraction of Salient Sentences from Labelled Documents. https://doi.org/10.48550/ARXIV.1412.6815
[24]
Sebastian Dennerlein, Dominik Kowald, Elisabeth Lex, Dieter Theiler, Emanuel Lacic, and Tobias Ley. 2015. The social semantic server. https://doi.org/10.1145/2809563.2809614
[25]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805
[26]
Shipi Dhanorkar, Christine T. Wolf, Kun Qian, Anbang Xu, Lucian Popa, and Yunyao Li. 2021. Who Needs to Know What, When?: Broadening the Explainable AI (XAI) Design Space by Looking at Explanations Across the AI Lifecycle. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1591–1602. https://doi.org/10.1145/3461778.3462131
[27]
Finale Doshi-Velez and Been Kim. 2017. A roadmap for a rigorous science of interpretability. arXiv preprint arXiv:1702.08608 2 (2017), 1.
[28]
Paul Dourish. 2016. Algorithms and their others: Algorithmic culture in context. Big Data & Society 3, 2 (2016), 2053951716665128.
[29]
Patrick Dubois, Volodymyr Dziubak, and Andrea Bunt. 2017. Tell Me More! Soliciting Reader Contributions to Software Tutorials. In Proceedings of the 43rd Graphics Interface Conference (Edmonton, Alberta, Canada) (GI ’17). Canadian Human-Computer Communications Society, Waterloo, CAN, 16–23.
[30]
Sebastian Ebert, Alice Shoshana Jakobovits, and Katja Filippova. 2022. Understanding Text Classification Data and Models Using Aggregated Input Salience.
[31]
Upol Ehsan, Q. Vera Liao, Michael Muller, Mark O. Riedl, and Justin D. Weisz. 2021. Expanding Explainability: Towards Social Transparency in AI Systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 82, 19 pages. https://doi.org/10.1145/3411764.3445188
[32]
Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2009. Visualizing higher-layer features of a deep network. University of Montreal 1341, 3 (2009), 1.
[33]
Leah Findlater and Joanna McGrenere. 2010. Beyond Performance: Feature Awareness in Personalized Interfaces. Int. J. Hum.-Comput. Stud. 68, 3 (mar 2010), 121–137. https://doi.org/10.1016/j.ijhcs.2009.10.002
[34]
Adam Fouse, Ryan S. Mullins, Gabriel Ganberg, and Chad Weiss. 2017. The Evolution of User Experiences and Interfaces for Delivering Context-Aware Recommendations to Information Analysts., 15–26 pages. https://doi.org/10.1007/978-3-319-60492-3_2
[35]
C. Ailie Fraser, Mira Dontcheva, Holger Winnemöller, Sheryl Ehrlich, and Scott Klemmer. 2016. DiscoverySpace: Suggesting Actions in Complex Software. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 1221–1232. https://doi.org/10.1145/2901790.2901849
[36]
Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes. Interactions 6, 1 (jan 1999), 21–29. https://doi.org/10.1145/291224.291235
[37]
Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning., 80–89 pages.
[38]
Saul Greenberg. 1993. The computer user as toolsmith: The use, reuse and organization of computer-based tools.
[39]
Justin B. Grossman, David D. Woods, and Emily S. Patterson. 2007. Supporting the Cognitive Work of Information Analysis and Synthesis: A Study of the Military Intelligence Domain. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 51, 4 (Oct. 2007), 348–352. https://doi.org/10.1177/154193120705100442
[40]
Tovi Grossman, George Fitzmaurice, and Ramtin Attar. 2009. A Survey of Software Learnability: Metrics, Methodologies and Guidelines. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 649–658. https://doi.org/10.1145/1518701.1518803
[41]
Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2010. Chronicle: Capture, Exploration, and Playback of Document Workflow Histories. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (New York, New York, USA) (UIST ’10). Association for Computing Machinery, New York, NY, USA, 143–152. https://doi.org/10.1145/1866029.1866054
[42]
David Gunning, Mark Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, and Guang-Zhong Yang. 2019. XAI—Explainable artificial intelligence. Science robotics 4, 37 (2019), eaay7120.
[43]
Xiaochuang Han, Byron C. Wallace, and Yulia Tsvetkov. 2020. Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions. https://doi.org/10.48550/ARXIV.2005.06676
[44]
Bo Hedberg. 1981. How organizations learn and unlearn., 3–27 pages.
[45]
Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M. Drucker. 2019. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300809
[46]
Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics 25, 8 (2019), 2674–2693. https://doi.org/10.1109/TVCG.2018.2843369
[47]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3290605.3300830
[48]
Karen Holtzblatt, Jessamyn Burns Wendell, and Shelley Wood. 2004. Rapid Contextual Design: A How-to Guide to Key Techniques for User-Centered Design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[49]
Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human factors in model interpretability: Industry practices, challenges, and needs. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–26.
[50]
Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Allison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck. 2003. Technology Probes: Inspiring Design for and with Families. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). Association for Computing Machinery, New York, NY, USA, 17–24. https://doi.org/10.1145/642611.642616
[51]
Alon Jacovi and Yoav Goldberg. 2020. Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?https://doi.org/10.48550/ARXIV.2004.03685
[52]
Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, and Michael Terry. 2019. XRAI: Better Attributions Through Regions., 4947-4956 pages. https://doi.org/10.1109/ICCV.2019.00505
[53]
Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376219
[54]
Kimia Kiani, Parmit K. Chilana, Andrea Bunt, Tovi Grossman, and George W. Fitzmaurice. 2020. "I Would Just Ask Someone": Learning Feature-Rich Design Software in the Modern Workplace. https://doi.org/10.1109/VL/HCC50065.2020.9127288
[55]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory sayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)., 2668–2677 pages. https://proceedings.mlr.press/v80/kim18d.html
[56]
Gary Klein. 2015. A naturalistic decision making perspective on studying intuitive decision making. Journal of applied research in memory and cognition 4, 3 (2015), 164–168.
[57]
Jose Kooken, Tobias Ley, and Robert De Hoog. 2007. How Do People Learn at the Workplace? Investigating Four Workplace Learning Assumptions. In Proceedings of the Second European Conference on Technology Enhanced Learning: Creating New Learning Experiences on a Global Scale (Crete, Greece) (EC-TEL’07). Springer-Verlag, Berlin, Heidelberg, 158–171.
[58]
Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with Predictions: Visual Inspection of Black-Box Machine Learning Models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5686–5697. https://doi.org/10.1145/2858036.2858529
[59]
Benjamin Lafreniere, Andrea Bunt, John S. Whissell, Charles L. A. Clarke, and Michael Terry. 2010. Characterizing Large-Scale Use of a Direct Manipulation Application in the Wild. In Proceedings of Graphics Interface 2010 (Ottawa, Ontario, Canada) (GI ’10). Canadian Information Processing Society, CAN, 11–18.
[60]
Benjamin Lafreniere, Parmit K. Chilana, Adam Fourney, and Michael A. Terry. 2015. These Aren’t the Commands You’re Looking For: Addressing False Feedforward in Feature-Rich Software. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 619–628. https://doi.org/10.1145/2807442.2807482
[61]
Wei Li, Justin Matejka, Tovi Grossman, Joseph A. Konstan, and George Fitzmaurice. 2011. Design and Evaluation of a Command Recommendation System for Software Applications. ACM Trans. Comput.-Hum. Interact. 18, 2, Article 6 (jul 2011), 35 pages. https://doi.org/10.1145/1970378.1970380
[62]
Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3313831.3376590
[63]
Matthew Lount and Andrea Bunt. 2014. Characterizing Web-Based Tutorials: Exploring Quality, Community, and Showcasing Strategies. In Proceedings of the 32nd ACM International Conference on The Design of Communication CD-ROM (Colorado Springs, CO, USA) (SIGDOC ’14). Association for Computing Machinery, New York, NY, USA, Article 6, 10 pages. https://doi.org/10.1145/2666216.2666221
[64]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. https://aclanthology.org/P11-1015
[65]
Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2011. IP-QAT: In-Product Questions, Answers, and Tips. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 175–184. https://doi.org/10.1145/2047196.2047218
[66]
Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019), 1–38.
[67]
Yao Ming, Huamin Qu, and Enrico Bertini. 2019. RuleMatrix: Visualizing and Understanding Classifiers with Rules. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 342–352. https://doi.org/10.1109/TVCG.2018.2864812
[68]
Christoph Molnar. 2020. Interpretable machine learning.
[69]
Shane T. Mueller, Robert R. Hoffman, William Clancey, Abigail Emrey, and Gary Klein. 2019. Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI. https://doi.org/10.48550/ARXIV.1902.01876
[70]
David G. Novick, Oscar D. Andrade, and Nathaniel Bean. 2009. The Micro-Structure of Use of Help. In Proceedings of the 27th ACM International Conference on Design of Communication (Bloomington, Indiana, USA) (SIGDOC ’09). Association for Computing Machinery, New York, NY, USA, 97–104. https://doi.org/10.1145/1621995.1622014
[71]
David G. Novick, Oscar D. Andrade, Nathaniel Bean, and Edith Elizalde. 2008. Help-Based Tutorials. In Proceedings of the 26th Annual ACM International Conference on Design of Communication (Lisbon, Portugal) (SIGDOC ’08). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/1456536.1456538
[72]
David G. Novick and Karen Ward. 2006. Why Don’t People Read the Manual?. In Proceedings of the 24th Annual ACM International Conference on Design of Communication (Myrtle Beach, SC, USA) (SIGDOC ’06). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/1166324.1166329
[73]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
[74]
Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating Statistical Machine Learning as a Tool for Software Development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 667–676. https://doi.org/10.1145/1357054.1357160
[75]
R.H. Pherson and R.J. Heuer. 2020. Structured Analytic Techniques for Intelligence Analysis., 305–339 pages. https://books.google.ca/books?id=69jBDwAAQBAJ
[76]
Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis., 2–4 pages. https://analysis.mitre.org/proceedings/Final_Papers_Files/206_Camera_Ready_Paper.pdf
[77]
Nina Poerner, Hinrich Schütze, and Benjamin Roth. 2018. Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 340–350. https://doi.org/10.18653/v1/P18-1032
[78]
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. 2020. Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems 33 (2020), 19920–19930.
[79]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res. 21, 140 (2020), 1–67.
[80]
Tilman Räuker, Anson Ho, Stephen Casper, and Dylan Hadfield-Menell. 2022. Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. https://doi.org/10.48550/ARXIV.2207.13243
[81]
Marc Rettig. 1991. Nobody reads documentation. Commun. ACM 34, 7 (July 1991), 19–24. https://doi.org/10.1145/105783.105788
[82]
Marco Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, San Diego, California, 97–101. https://doi.org/10.18653/v1/N16-3020
[83]
John Rieman. 1996. A field study of exploratory learning strategies. ACM Transactions on Computer-Human Interaction (TOCHI) 3, 3 (1996), 189–218.
[84]
Raymond Scupin. 1997. The KJ method: A technique for analyzing data derived from Japanese ethnology. Human organization 56, 2 (1997), 233–237.
[85]
Julie Anne Séguin, Alec Scharff, and Kyle Pedersen. 2019. Triptech: A Method for Evaluating Early Design Concepts. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3290607.3299061
[86]
Ben Shneiderman. 1983. Direct Manipulation: A Step Beyond Programming Languages. Computer 16, 8 (1983), 57–69. https://doi.org/10.1109/MC.1983.1654471
[87]
Dilruba Showkat. 2018. Determining Newcomers Barrier in Software Development: An IT Industry Based Investigation. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (Jersey City, NJ, USA) (CSCW ’18). Association for Computing Machinery, New York, NY, USA, 165–168. https://doi.org/10.1145/3272973.3274046
[88]
John Stasko, Carsten Görg, and Zhicheng Liu. 2008. Jigsaw: Supporting Investigative Analysis through Interactive Visualization. Information Visualization 7, 2 (Jan. 2008), 118–132. https://doi.org/10.1057/palgrave.ivs.9500180
[89]
Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles. 2015. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 1379–1392. https://doi.org/10.1145/2675133.2675215
[90]
Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush. 2019. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 353–363. https://doi.org/10.1109/TVCG.2018.2865044
[91]
Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M. Rush. 2016. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1606.07461
[92]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks., 10 pages.
[93]
Harini Suresh, Steven R. Gomez, Kevin K. Nam, and Arvind Satyanarayan. 2021. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and Their Needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 74, 16 pages. https://doi.org/10.1145/3411764.3445088
[94]
Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 1283–1292. https://doi.org/10.1145/1518701.1518895
[95]
Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, and Ann Yuan. 2020. The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 107–118. https://doi.org/10.18653/v1/2020.emnlp-demos.15
[96]
Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word Representations: A Simple and General Method for Semi-Supervised Learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Uppsala, Sweden) (ACL ’10). Association for Computational Linguistics, USA, 384–394.
[97]
Michael B. Twidale. 2005. Over the Shoulder Learning: Supporting Brief Informal Learning. Computer Supported Cooperative Work (CSCW) 14, 6 (Nov. 2005), 505–547. https://doi.org/10.1007/s10606-005-9007-7
[98]
Jayne Wallace, John McCarthy, Peter C. Wright, and Patrick Olivier. 2013. Making Design Probes Work. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 3441–3450. https://doi.org/10.1145/2470654.2466473
[99]
James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26, 1 (2019), 56–65.
[100]
Jeremy M Wolfe and Todd S Horowitz. 2017. Five factors that guide attention in visual search. Nature Human Behaviour 1, 3 (2017), 0058.
[101]
David D. Woods. 2017. Resilience Engineering. https://doi.org/10.1201/9781315605685
[102]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://doi.org/10.48550/ARXIV.1609.08144
[103]
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1391–1399. https://doi.org/10.1145/3038912.3052591
[104]
Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 573–584. https://doi.org/10.1145/3196709.3196729
[105]
Zining Ye, Xinran Yuan, Shaurya Gaur, Aaron Halfaker, Jodi Forlizzi, and Haiyi Zhu. 2021. Wikipedia ORES Explorer: Visualizing Trade-Offs For Designing Applications With Machine Learning API. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1554–1565. https://doi.org/10.1145/3461778.3462099
[106]
Daniel J. Zelik, Emily S. Patterson, and David D. Woods. 2007. Judging Sufficiency: How Professional Intelligence Analysts Assess Analytical Rigor. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 51, 4 (Oct. 2007), 318–322. https://doi.org/10.1177/154193120705100436
[107]
Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David S. Ebert. 2019. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan. 2019), 364–373. https://doi.org/10.1109/tvcg.2018.2864499
[108]
Wencan Zhang and Brian Y Lim. 2022. Towards Relatable Explainable AI with the Perceptual Process. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 181, 24 pages. https://doi.org/10.1145/3491102.3501826
[109]
Xun Zhao, Yanhong Wu, Dik Lun Lee, and Weiwei Cui. 2019. iForest: Interpreting Random Forests via Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 407–416. https://doi.org/10.1109/TVCG.2018.2864475
[110]
John Zimmerman, Jodi Forlizzi, and Shelley Evenson. 2007. Research through Design as a Method for Interaction Design Research in HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 493–502. https://doi.org/10.1145/1240624.1240704

Cited By

View all
  • (2024)Understanding the Dataset Practitioners Behind Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651007(1-7)Online publication date: 11-May-2024
  • (2024)Trust in AI-assisted Decision Making: Perspectives from Those Behind the System and Those for Whom the Decision is MadeProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642018(1-14)Online publication date: 11-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DIS '23: Proceedings of the 2023 ACM Designing Interactive Systems Conference
July 2023
2717 pages
ISBN:9781450398930
DOI:10.1145/3563657
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2023

Check for updates

Author Tags

  1. Interpretability
  2. ML practitioners
  3. learnability

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DIS '23
Sponsor:
DIS '23: Designing Interactive Systems Conference
July 10 - 14, 2023
PA, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)367
  • Downloads (Last 6 weeks)70
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding the Dataset Practitioners Behind Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651007(1-7)Online publication date: 11-May-2024
  • (2024)Trust in AI-assisted Decision Making: Perspectives from Those Behind the System and Those for Whom the Decision is MadeProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642018(1-14)Online publication date: 11-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media