research-article

Open access

From Discovery to Adoption: Understanding the ML Practitioners’ Interpretability Journey

Authors:

Narges Ashtari,

Mahima PushkarnaAuthors Info & Claims

DIS '23: Proceedings of the 2023 ACM Designing Interactive Systems Conference

Pages 2304 - 2325

https://doi.org/10.1145/3563657.3596046

Published: 10 July 2023 Publication History

All formats PDF

Abstract

Models are interpretable when machine learning (ML) practitioners can readily understand the reasoning behind their predictions. Ironically, little is known about the ML practitioners’ experience of discovering and adopting novel interpretability techniques in production settings. In a qualitative study with 18 practitioners at a large technology company working with text data, we found that despite varied tasks, practitioners experienced nearly identical challenges related to interpretability methods in model analysis workflows. These stem from problem formulation, the social nature of interpretability investigations, and non-standard practices in cross-functional organizational contexts. A follow-up examination of early-stage design probes with seven practitioners suggests that self-reported experts are “perpetual intermediates”, who can benefit from regular, responsive, and in-situ education about interpretability methods across workflows, regardless of prior experience with models, analysis tools, or interpretability techniques. From these findings, we emphasize the need for multi-stage support for learning of interpretability methods for real-world NLP applications.

References

[1]

Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3173574.3174156

Digital Library

[2]

Mark S. Ackerman, Juri Dachtera, Volkmar Pipek, and Volker Wulf. 2013. Sharing Knowledge and Expertise: The CSCW View of Knowledge Management. Computer Supported Cooperative Work (CSCW) 22, 4-6 (Aug. 2013), 531–573. https://doi.org/10.1007/s10606-013-9192-8

Digital Library

[3]

Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6 (2018), 52138–52160.

[4]

David Alvarez-Melis, Harmanpreet Kaur, Hal Daumé, Hanna Wallach, and Jennifer Wortman Vaughan. 2021. From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence. https://doi.org/10.48550/ARXIV.2104.13299

[5]

Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. ModelTracker: Redesigning Performance Analysis Tools for Machine Learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 337–346. https://doi.org/10.1145/2702123.2702509

Digital Library

[6]

Oscar D. Andrade, Nathaniel Bean, and David G. Novick. 2009. The Macro-Structure of Use of Help. In Proceedings of the 27th ACM International Conference on Design of Communication (Bloomington, Indiana, USA) (SIGDOC ’09). Association for Computing Machinery, New York, NY, USA, 143–150. https://doi.org/10.1145/1621995.1622022

Digital Library

[7]

Christopher Antonik. 2015. How Do Professional Analysts Judge Rigor: The Effect of Indicators of Analytic Rigor on Critiques of Analytic Product and Process. Ph. D. Dissertation. The Ohio State University.

[8]

Narges Ashtari, Parsa Alamzadeh, Gayatri Ganapathy, and Parmit Chilana. 2022. PONI: A Personalized Onboarding Interface for Getting Inspiration and Learning About AR/VR Creation. In Nordic Human-Computer Interaction Conference (Aarhus, Denmark) (NordiCHI ’22). Association for Computing Machinery, New York, NY, USA, Article 32, 14 pages. https://doi.org/10.1145/3546155.3546642

Digital Library

[9]

David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. The Journal of Machine Learning Research 11 (2010), 1803–1831.

Digital Library

[10]

Jasmijn Bastings, Sebastian Ebert, Polina Zablotskaia, Anders Sandholm, and Katja Filippova. 2021. "Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification. https://doi.org/10.48550/ARXIV.2111.07367

[11]

Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, and Dominik Moritz. 2022. Symphony: Composing Interactive Interfaces for Machine Learning. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 210, 14 pages. https://doi.org/10.1145/3491102.3502102

Digital Library

[12]

Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. https://doi.org/10.48550/ARXIV.1810.01943

[13]

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax

[14]

John M. Carroll. 1990. The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill. MIT Press, Cambridge, MA, USA.

Digital Library

[15]

John M. Carroll and Mary Beth Rosson. 1987. Paradox of the Active User. In Interfacing Thought: Cognitive Aspects of Human-Computer Interaction. MIT Press, Cambridge, MA, USA, 80–111.

[16]

Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (2019), 832.

[17]

Parmit K. Chilana, Amy J. Ko, and Jacob O. Wobbrock. 2012. LemonAid: Selection-Based Crowdsourced Contextual Help for Web Applications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1549–1558. https://doi.org/10.1145/2207676.2208620

Digital Library

[18]

Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. 2016. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. https://doi.org/10.1038/srep27755

[19]

Alan Cooper, Robert Reimann, David Cronin, and Christopher Noessel. 2014. About Face: The Essentials of Interaction Design.

[20]

Juliet M. Corbin and Anselm Strauss. 1990. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology 13, 1 (1990), 3–21. https://doi.org/10.1007/bf00988593

[21]

Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A Survey of the State of Explainable AI for Natural Language Processing., 447–459 pages. https://aclanthology.org/2020.aacl-main.46

[22]

Arun Das and Paul Rad. 2020. Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. https://doi.org/10.48550/ARXIV.2006.11371

[23]

Misha Denil, Alban Demiraj, and Nando de Freitas. 2014. Extraction of Salient Sentences from Labelled Documents. https://doi.org/10.48550/ARXIV.1412.6815

[24]

Sebastian Dennerlein, Dominik Kowald, Elisabeth Lex, Dieter Theiler, Emanuel Lacic, and Tobias Ley. 2015. The social semantic server. https://doi.org/10.1145/2809563.2809614

Digital Library

[25]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805

[26]

Shipi Dhanorkar, Christine T. Wolf, Kun Qian, Anbang Xu, Lucian Popa, and Yunyao Li. 2021. Who Needs to Know What, When?: Broadening the Explainable AI (XAI) Design Space by Looking at Explanations Across the AI Lifecycle. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1591–1602. https://doi.org/10.1145/3461778.3462131

Digital Library

[27]

Finale Doshi-Velez and Been Kim. 2017. A roadmap for a rigorous science of interpretability. arXiv preprint arXiv:1702.08608 2 (2017), 1.

[28]

Paul Dourish. 2016. Algorithms and their others: Algorithmic culture in context. Big Data & Society 3, 2 (2016), 2053951716665128.

[29]

Patrick Dubois, Volodymyr Dziubak, and Andrea Bunt. 2017. Tell Me More! Soliciting Reader Contributions to Software Tutorials. In Proceedings of the 43rd Graphics Interface Conference (Edmonton, Alberta, Canada) (GI ’17). Canadian Human-Computer Communications Society, Waterloo, CAN, 16–23.

Digital Library

[30]

Sebastian Ebert, Alice Shoshana Jakobovits, and Katja Filippova. 2022. Understanding Text Classification Data and Models Using Aggregated Input Salience.

[31]

Upol Ehsan, Q. Vera Liao, Michael Muller, Mark O. Riedl, and Justin D. Weisz. 2021. Expanding Explainability: Towards Social Transparency in AI Systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 82, 19 pages. https://doi.org/10.1145/3411764.3445188

Digital Library

[32]

Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2009. Visualizing higher-layer features of a deep network. University of Montreal 1341, 3 (2009), 1.

[33]

Leah Findlater and Joanna McGrenere. 2010. Beyond Performance: Feature Awareness in Personalized Interfaces. Int. J. Hum.-Comput. Stud. 68, 3 (mar 2010), 121–137. https://doi.org/10.1016/j.ijhcs.2009.10.002

Digital Library

[34]

Adam Fouse, Ryan S. Mullins, Gabriel Ganberg, and Chad Weiss. 2017. The Evolution of User Experiences and Interfaces for Delivering Context-Aware Recommendations to Information Analysts., 15–26 pages. https://doi.org/10.1007/978-3-319-60492-3_2

[35]

C. Ailie Fraser, Mira Dontcheva, Holger Winnemöller, Sheryl Ehrlich, and Scott Klemmer. 2016. DiscoverySpace: Suggesting Actions in Complex Software. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 1221–1232. https://doi.org/10.1145/2901790.2901849

Digital Library

[36]

Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes. Interactions 6, 1 (jan 1999), 21–29. https://doi.org/10.1145/291224.291235

Digital Library

[37]

Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning., 80–89 pages.

[38]

Saul Greenberg. 1993. The computer user as toolsmith: The use, reuse and organization of computer-based tools.

[39]

Justin B. Grossman, David D. Woods, and Emily S. Patterson. 2007. Supporting the Cognitive Work of Information Analysis and Synthesis: A Study of the Military Intelligence Domain. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 51, 4 (Oct. 2007), 348–352. https://doi.org/10.1177/154193120705100442

[40]

Tovi Grossman, George Fitzmaurice, and Ramtin Attar. 2009. A Survey of Software Learnability: Metrics, Methodologies and Guidelines. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 649–658. https://doi.org/10.1145/1518701.1518803

Digital Library

[41]

Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2010. Chronicle: Capture, Exploration, and Playback of Document Workflow Histories. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (New York, New York, USA) (UIST ’10). Association for Computing Machinery, New York, NY, USA, 143–152. https://doi.org/10.1145/1866029.1866054

Digital Library

[42]

David Gunning, Mark Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, and Guang-Zhong Yang. 2019. XAI—Explainable artificial intelligence. Science robotics 4, 37 (2019), eaay7120.

[43]

Xiaochuang Han, Byron C. Wallace, and Yulia Tsvetkov. 2020. Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions. https://doi.org/10.48550/ARXIV.2005.06676

[44]

Bo Hedberg. 1981. How organizations learn and unlearn., 3–27 pages.

[45]

Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M. Drucker. 2019. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300809

Digital Library

[46]

Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics 25, 8 (2019), 2674–2693. https://doi.org/10.1109/TVCG.2018.2843369

Digital Library

[47]

Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3290605.3300830

Digital Library

[48]

Karen Holtzblatt, Jessamyn Burns Wendell, and Shelley Wood. 2004. Rapid Contextual Design: A How-to Guide to Key Techniques for User-Centered Design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

[49]

Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human factors in model interpretability: Industry practices, challenges, and needs. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–26.

Digital Library

[50]

Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Allison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck. 2003. Technology Probes: Inspiring Design for and with Families. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). Association for Computing Machinery, New York, NY, USA, 17–24. https://doi.org/10.1145/642611.642616

Digital Library

[51]

Alon Jacovi and Yoav Goldberg. 2020. Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?https://doi.org/10.48550/ARXIV.2004.03685

[52]

Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, and Michael Terry. 2019. XRAI: Better Attributions Through Regions., 4947-4956 pages. https://doi.org/10.1109/ICCV.2019.00505

[53]

Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376219

Digital Library

[54]

Kimia Kiani, Parmit K. Chilana, Andrea Bunt, Tovi Grossman, and George W. Fitzmaurice. 2020. "I Would Just Ask Someone": Learning Feature-Rich Design Software in the Modern Workplace. https://doi.org/10.1109/VL/HCC50065.2020.9127288

[55]

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory sayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)., 2668–2677 pages. https://proceedings.mlr.press/v80/kim18d.html

[56]

Gary Klein. 2015. A naturalistic decision making perspective on studying intuitive decision making. Journal of applied research in memory and cognition 4, 3 (2015), 164–168.

[57]

Jose Kooken, Tobias Ley, and Robert De Hoog. 2007. How Do People Learn at the Workplace? Investigating Four Workplace Learning Assumptions. In Proceedings of the Second European Conference on Technology Enhanced Learning: Creating New Learning Experiences on a Global Scale (Crete, Greece) (EC-TEL’07). Springer-Verlag, Berlin, Heidelberg, 158–171.

Digital Library

[58]

Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with Predictions: Visual Inspection of Black-Box Machine Learning Models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5686–5697. https://doi.org/10.1145/2858036.2858529

Digital Library

[59]

Benjamin Lafreniere, Andrea Bunt, John S. Whissell, Charles L. A. Clarke, and Michael Terry. 2010. Characterizing Large-Scale Use of a Direct Manipulation Application in the Wild. In Proceedings of Graphics Interface 2010 (Ottawa, Ontario, Canada) (GI ’10). Canadian Information Processing Society, CAN, 11–18.

Digital Library

[60]

Benjamin Lafreniere, Parmit K. Chilana, Adam Fourney, and Michael A. Terry. 2015. These Aren’t the Commands You’re Looking For: Addressing False Feedforward in Feature-Rich Software. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 619–628. https://doi.org/10.1145/2807442.2807482

Digital Library

[61]

Wei Li, Justin Matejka, Tovi Grossman, Joseph A. Konstan, and George Fitzmaurice. 2011. Design and Evaluation of a Command Recommendation System for Software Applications. ACM Trans. Comput.-Hum. Interact. 18, 2, Article 6 (jul 2011), 35 pages. https://doi.org/10.1145/1970378.1970380

Digital Library

[62]

Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3313831.3376590

Digital Library

[63]

Matthew Lount and Andrea Bunt. 2014. Characterizing Web-Based Tutorials: Exploring Quality, Community, and Showcasing Strategies. In Proceedings of the 32nd ACM International Conference on The Design of Communication CD-ROM (Colorado Springs, CO, USA) (SIGDOC ’14). Association for Computing Machinery, New York, NY, USA, Article 6, 10 pages. https://doi.org/10.1145/2666216.2666221

Digital Library

[64]

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. https://aclanthology.org/P11-1015

Digital Library

[65]

Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2011. IP-QAT: In-Product Questions, Answers, and Tips. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 175–184. https://doi.org/10.1145/2047196.2047218

Digital Library

[66]

Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019), 1–38.

[67]

Yao Ming, Huamin Qu, and Enrico Bertini. 2019. RuleMatrix: Visualizing and Understanding Classifiers with Rules. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 342–352. https://doi.org/10.1109/TVCG.2018.2864812

Digital Library

[68]

Christoph Molnar. 2020. Interpretable machine learning.

[69]

Shane T. Mueller, Robert R. Hoffman, William Clancey, Abigail Emrey, and Gary Klein. 2019. Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI. https://doi.org/10.48550/ARXIV.1902.01876

[70]

David G. Novick, Oscar D. Andrade, and Nathaniel Bean. 2009. The Micro-Structure of Use of Help. In Proceedings of the 27th ACM International Conference on Design of Communication (Bloomington, Indiana, USA) (SIGDOC ’09). Association for Computing Machinery, New York, NY, USA, 97–104. https://doi.org/10.1145/1621995.1622014

Digital Library

[71]

David G. Novick, Oscar D. Andrade, Nathaniel Bean, and Edith Elizalde. 2008. Help-Based Tutorials. In Proceedings of the 26th Annual ACM International Conference on Design of Communication (Lisbon, Portugal) (SIGDOC ’08). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/1456536.1456538

Digital Library

[72]

David G. Novick and Karen Ward. 2006. Why Don’t People Read the Manual?. In Proceedings of the 24th Annual ACM International Conference on Design of Communication (Myrtle Beach, SC, USA) (SIGDOC ’06). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/1166324.1166329

Digital Library

[73]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

[74]

Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating Statistical Machine Learning as a Tool for Software Development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 667–676. https://doi.org/10.1145/1357054.1357160

Digital Library

[75]

R.H. Pherson and R.J. Heuer. 2020. Structured Analytic Techniques for Intelligence Analysis., 305–339 pages. https://books.google.ca/books?id=69jBDwAAQBAJ

[76]

Peter Pirolli and Stuart Card. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis., 2–4 pages. https://analysis.mitre.org/proceedings/Final_Papers_Files/206_Camera_Ready_Paper.pdf

[77]

Nina Poerner, Hinrich Schütze, and Benjamin Roth. 2018. Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 340–350. https://doi.org/10.18653/v1/P18-1032

[78]

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. 2020. Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems 33 (2020), 19920–19930.

[79]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu, 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res. 21, 140 (2020), 1–67.

[80]

Tilman Räuker, Anson Ho, Stephen Casper, and Dylan Hadfield-Menell. 2022. Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. https://doi.org/10.48550/ARXIV.2207.13243

[81]

Marc Rettig. 1991. Nobody reads documentation. Commun. ACM 34, 7 (July 1991), 19–24. https://doi.org/10.1145/105783.105788

Digital Library

[82]

Marco Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, San Diego, California, 97–101. https://doi.org/10.18653/v1/N16-3020

[83]

John Rieman. 1996. A field study of exploratory learning strategies. ACM Transactions on Computer-Human Interaction (TOCHI) 3, 3 (1996), 189–218.

Digital Library

[84]

Raymond Scupin. 1997. The KJ method: A technique for analyzing data derived from Japanese ethnology. Human organization 56, 2 (1997), 233–237.

[85]

Julie Anne Séguin, Alec Scharff, and Kyle Pedersen. 2019. Triptech: A Method for Evaluating Early Design Concepts. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3290607.3299061

Digital Library

[86]

Ben Shneiderman. 1983. Direct Manipulation: A Step Beyond Programming Languages. Computer 16, 8 (1983), 57–69. https://doi.org/10.1109/MC.1983.1654471

Digital Library

[87]

Dilruba Showkat. 2018. Determining Newcomers Barrier in Software Development: An IT Industry Based Investigation. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (Jersey City, NJ, USA) (CSCW ’18). Association for Computing Machinery, New York, NY, USA, 165–168. https://doi.org/10.1145/3272973.3274046

Digital Library

[88]

John Stasko, Carsten Görg, and Zhicheng Liu. 2008. Jigsaw: Supporting Investigative Analysis through Interactive Visualization. Information Visualization 7, 2 (Jan. 2008), 118–132. https://doi.org/10.1057/palgrave.ivs.9500180

[89]

Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles. 2015. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 1379–1392. https://doi.org/10.1145/2675133.2675215

Digital Library

[90]

Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush. 2019. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 353–363. https://doi.org/10.1109/TVCG.2018.2865044

Digital Library

[91]

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M. Rush. 2016. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1606.07461

[92]

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks., 10 pages.

[93]

Harini Suresh, Steven R. Gomez, Kevin K. Nam, and Arvind Satyanarayan. 2021. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and Their Needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 74, 16 pages. https://doi.org/10.1145/3411764.3445088

Digital Library

[94]

Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 1283–1292. https://doi.org/10.1145/1518701.1518895

Digital Library

[95]

Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, and Ann Yuan. 2020. The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 107–118. https://doi.org/10.18653/v1/2020.emnlp-demos.15

[96]

Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word Representations: A Simple and General Method for Semi-Supervised Learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Uppsala, Sweden) (ACL ’10). Association for Computational Linguistics, USA, 384–394.

[97]

Michael B. Twidale. 2005. Over the Shoulder Learning: Supporting Brief Informal Learning. Computer Supported Cooperative Work (CSCW) 14, 6 (Nov. 2005), 505–547. https://doi.org/10.1007/s10606-005-9007-7

Digital Library

[98]

Jayne Wallace, John McCarthy, Peter C. Wright, and Patrick Olivier. 2013. Making Design Probes Work. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 3441–3450. https://doi.org/10.1145/2470654.2466473

Digital Library

[99]

James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics 26, 1 (2019), 56–65.

[100]

Jeremy M Wolfe and Todd S Horowitz. 2017. Five factors that guide attention in visual search. Nature Human Behaviour 1, 3 (2017), 0058.

[101]

David D. Woods. 2017. Resilience Engineering. https://doi.org/10.1201/9781315605685

[102]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://doi.org/10.48550/ARXIV.1609.08144

[103]

Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1391–1399. https://doi.org/10.1145/3038912.3052591

Digital Library

[104]

Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 573–584. https://doi.org/10.1145/3196709.3196729

Digital Library

[105]

Zining Ye, Xinran Yuan, Shaurya Gaur, Aaron Halfaker, Jodi Forlizzi, and Haiyi Zhu. 2021. Wikipedia ORES Explorer: Visualizing Trade-Offs For Designing Applications With Machine Learning API. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1554–1565. https://doi.org/10.1145/3461778.3462099

Digital Library

[106]

Daniel J. Zelik, Emily S. Patterson, and David D. Woods. 2007. Judging Sufficiency: How Professional Intelligence Analysts Assess Analytical Rigor. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 51, 4 (Oct. 2007), 318–322. https://doi.org/10.1177/154193120705100436

[107]

Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David S. Ebert. 2019. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan. 2019), 364–373. https://doi.org/10.1109/tvcg.2018.2864499

Digital Library

[108]

Wencan Zhang and Brian Y Lim. 2022. Towards Relatable Explainable AI with the Perceptual Process. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 181, 24 pages. https://doi.org/10.1145/3491102.3501826

Digital Library

[109]

Xun Zhao, Yanhong Wu, Dik Lun Lee, and Weiwei Cui. 2019. iForest: Interpreting Random Forests via Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 407–416. https://doi.org/10.1109/TVCG.2018.2864475

Digital Library

[110]

John Zimmerman, Jodi Forlizzi, and Shelley Evenson. 2007. Research through Design as a Method for Interaction Design Research in HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 493–502. https://doi.org/10.1145/1240624.1240704

Digital Library

Cited By

Qian CReif EKahng M(2024)Understanding the Dataset Practitioners Behind Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651007(1-7)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3651007
Vereschak OAlizadeh FBailly GCaramiaux B(2024)Trust in AI-assisted Decision Making: Perspectives from Those Behind the System and Those for Whom the Decision is MadeProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642018(1-14)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642018

Index Terms

From Discovery to Adoption: Understanding the ML Practitioners’ Interpretability Journey

Index terms have been assigned to the content through auto-classification.

Recommendations

Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

Machine learning (ML) models are now routinely deployed in domains ranging from criminal justice to healthcare. With this newfound ubiquity, ML has moved beyond academia and grown into an engineering discipline. To that end, interpretability tools have ...
Towards a Non-Ideal Methodological Framework for Responsible ML
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Though ML practitioners increasingly employ various Responsible ML (RML) strategies, their methodological approach in practice is still unclear. In particular, the constraints, assumptions, and choices of practitioners with technical duties–such as ...
An interpretability improvement for fuzzy rule bases obtained by the iterative rule learning approach

Interpretability is one of the key concepts in many of the applications using the fuzzy rule-based approach. It is well known that there are many different criteria around this concept, the complexity being one of them. In this paper, we focus our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DIS '23: Proceedings of the 2023 ACM Designing Interactive Systems Conference

July 2023

2717 pages

ISBN:9781450398930

DOI:10.1145/3563657

Editors:
Daragh Byrne
Carnegie Mellon University
,
Nikolas Martelaro
Carnegie Mellon University
,
Andy Boucher
Northumbria University
,
David Chatting
Goldsmiths, University of London
,
Sarah Fdili Alaoui
LISN-Université Paris Saclay
,
Sarah Fox
Carnegie Mellon University
,
Iohanna Nicenboim
Delft University of Technology
,
Cayley MacArthur
University of Waterloo

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DIS '23

Sponsor:

SIGCHI

DIS '23: Designing Interactive Systems Conference

July 10 - 14, 2023

PA, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
508
Total Downloads

Downloads (Last 12 months)367
Downloads (Last 6 weeks)70

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qian CReif EKahng M(2024)Understanding the Dataset Practitioners Behind Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651007(1-7)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3651007
Vereschak OAlizadeh FBailly GCaramiaux B(2024)Trust in AI-assisted Decision Making: Perspectives from Those Behind the System and Those for Whom the Decision is MadeProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642018(1-14)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642018

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents