Where Are My Intelligent Assistant’s Mistakes? A Systematic Testing Approach

Kulesza, Todd; Burnett, Margaret; Stumpf, Simone; Wong, Weng-Keen; Das, Shubhomoy; Groce, Alex; Shinsel, Amber; Bice, Forrest; McIntosh, Kevin

doi:10.1007/978-3-642-21530-8_14

Todd Kulesza¹⁹,
Margaret Burnett¹⁹,
Simone Stumpf²⁰,
Weng-Keen Wong¹⁹,
Shubhomoy Das¹⁹,
Alex Groce¹⁹,
Amber Shinsel¹⁹,
Forrest Bice¹⁹ &
…
Kevin McIntosh¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6654))

Included in the following conference series:

International Symposium on End User Development

1404 Accesses
5 Citations

Abstract

Intelligent assistants are handling increasingly critical tasks, but until now, end users have had no way to systematically assess where their assistants make mistakes. For some intelligent assistants, this is a serious problem: if the assistant is doing work that is important, such as assisting with qualitative research or monitoring an elderly parent’s safety, the user may pay a high cost for unnoticed mistakes. This paper addresses the problem with WYSIWYT/ML (What You See Is What You Test for Machine Learning), a human/computer partnership that enables end users to systematically test intelligent assistants. Our empirical evaluation shows that WYSIWYT/ML helped end users find assistants’ mistakes significantly more effectively than ad hoc testing. Not only did it allow users to assess an assistant’s work on an average of 117 predictions in only 10 minutes, it also scaled to a much larger data set, assessing an assistant’s work on 623 out of 1,448 predictions using only the users’ original 10 minutes’ testing effort.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Ingredients for Responsible Machine Learning: A Commented Review of The Hitchhiker’s Guide to Responsible Machine Learning

Article Open access 15 September 2022

Towards Dependable and Explainable Machine Learning Using Automated Reasoning

Lessons for artificial intelligence from the study of natural stupidity

Article 09 April 2019

References

Abraham, R., Erwig, M.: AutoTest: A tool for automatic test case generation in spreadsheets. In: Proc. VL/HCC, pp. 43–50. IEEE, Los Alamitos (2006)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)
Google Scholar
Beizer, B.: Software Testing Techniques. International Thomson Computer Press (1990)
Google Scholar
Blackwell, A.: First steps in programming: A rationale for attention investment models. In: Proc. HCC, pp. 2–10. IEEE, Los Alamitos (2002)
Google Scholar
Burnett, M., Cook, C., Rothermel, G.: End-user software engineering. Comm. ACM 47(9), 53–58 (2004)
Article Google Scholar
Chang, C., Lin, C.: LIBSVM: A library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Fisher, M., Cao, M., Rothermel, G., Brown, D., Cook, C., Burnett, M.: Integrating automated test generation into the WYSIWYT spreadsheet testing methodology. ACM Trans. Software Engineering and Methodology 15(2), 150–194 (2006)
Article Google Scholar
Frankl, P., Weiss, S.: An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Trans. Software Eng. 19(3), 202–213 (1993)
Article Google Scholar
Glass, A., McGuinness, D., Wolverton, M.: Toward establishing trust in adaptive agents. In: Proc. IUI, pp. 227–236. ACM, New York (2008)
Chapter Google Scholar
Gmail Priority Inbox: Get through your email faster, http://google.com/mail/help/priority-inbox.html (accessed September 16, 2010)
Green, T., Petre, M.: Usability analysis of visual programming environments: A cognitive dimensions framework. J. Visual Languages and Computing 7(2) (June 1996)
Google Scholar
Grigoreanu, V., Cao, J., Kulesza, T., Bogart, C., Rector, K., Burnett, M., Wiedenbeck, S.: Can feature design reduce the gender gap in end-user software development environments? In: Proc. VL/HCC, pp. 149–156. IEEE, Los Alamitos (2008)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2003)
MATH Google Scholar
IEEE, IEEE Standard Glossary of Software Engineering Terminology (IEEE Std610.12-1990) (1990)
Google Scholar
Klann, M., Paterno, F., Wulf, V.: Future perspectives in end-user development. In: Lieberman, H., Paterno, F., Wulf, V. (eds.) End-User Development. Springer, Heidelberg (2006)
Google Scholar
Kniesel, G., Rho, T.: Newsgroup data set (2005), http://www.ai.mit.edu/jrennie/20newsgroups
Kulesza, T., Wong, W., Stumpf, S., Perona, S., White, R., Burnett, M., Oberst, I., Ko, A.: Fixing the program my computer learned: Barriers for end users, challenges for the machine. In: Proc. IUI, pp. 187–196. ACM, New York (2009)
Google Scholar
Kulesza, T., Stumpf, S., Burnett, M., Wong, W., Riche, Y., Moore, T., Oberst, I., Shinsel, A., McIntosh, K.: Explanatory debugging: Supporting end-user debugging of machine-learned programs. In: Proc. VL/HCC. IEEE, Los Alamitos (2010)
Google Scholar
Lawrance, J., Bogart, C., Burnett, M., Bellamy, R., Rector, K., Fleming, S.: How programmers debug, revisited: An information foraging theory perspective. IEEE Trans. Software Engineering (2011)
Google Scholar
Lim, B., Dey, A., Avrahami, D.: Why and why not explanations improve the intelligibility of context-aware intelligent systems. In: Proc. CHI, pp. 2119–2128. ACM, New York (2009)
Google Scholar
Lim, B., Dey, A.: Toolkit to support intelligibility in context-aware applications. In: Proc. Int. Conf. Ubiquitous Computing. ACM, New York (2010)
Google Scholar
Miller, R., Myers, B.: Outlier finding: Focusing user attention on possible errors. In: Proc. UIST, pp. 81–90. ACM, New York (2001)
Google Scholar
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on both features and instances. JMLR 7, 1655–1686 (2006)
MATH Google Scholar
Raz, O., Koopman, P., Shaw, M.: Semantic anomaly detection in online data sources. In: Proc. ICSE, pp. 302–312. IEEE, Los Alamitos (2002)
Google Scholar
Rothermel, G., Burnett, M., Li, L., Dupuis, C., Sheretov, A.: A methodology for testing spreadsheets. ACM Trans. Software Engineering and Methodology 10(1) (January 2001)
Google Scholar
Rowan, J., Mynatt, E.: Digital family portrait field trial: Support for aging in place. In: Proc. CHI, pp. 521–530. ACM, New York (2005)
Google Scholar
Scaffidi, C.: Unsupervised inference of data formats in human-readable notation. In: Proc. Int. Conf. Enterprise Integration Systems, pp. 236–241 (2007)
Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison (2009)
Google Scholar
Shen, J., Dietterich, T.: Active EM to reduce noise in activity recognition. In: Proc. IUI, pp. 132–140. ACM, New York (2007)
Chapter Google Scholar
Talbot, J., Lee, B., Kapoor, A., Tan, D.: EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. In: Proc. CHI, pp. 1283–1292. ACM, New York (2009)
Google Scholar
Tullio, J., Dey, A., Chalecki, J., Fogarty, J.: How it works: A field study of non-technical users interacting with an intelligent system. In: Proc. CHI, pp. 31–40. ACM, New York (2007)
Google Scholar
Wong, W.-K., Oberst, I., Das, S., Moore, T., Stumpf, S., McIntosh, K., Burnett, M.: End-user feature labeling: A locally-weighted regression approach. In: Proc IUI. ACM, New York (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of EECS, Kelley Engr. Center, Oregon State University, Corvallis, OR, 97331, United States
Todd Kulesza, Margaret Burnett, Weng-Keen Wong, Shubhomoy Das, Alex Groce, Amber Shinsel, Forrest Bice & Kevin McIntosh
Centre for HCI Design, City University London, Northampton Square, London, EC1V 0HB, UK
Simone Stumpf

Authors

Todd Kulesza
View author publications
You can also search for this author in PubMed Google Scholar
Margaret Burnett
View author publications
You can also search for this author in PubMed Google Scholar
Simone Stumpf
View author publications
You can also search for this author in PubMed Google Scholar
Weng-Keen Wong
View author publications
You can also search for this author in PubMed Google Scholar
Shubhomoy Das
View author publications
You can also search for this author in PubMed Google Scholar
Alex Groce
View author publications
You can also search for this author in PubMed Google Scholar
Amber Shinsel
View author publications
You can also search for this author in PubMed Google Scholar
Forrest Bice
View author publications
You can also search for this author in PubMed Google Scholar
Kevin McIntosh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Università degli Studi di Bari “A. Moro”, 70125, Bari, Italy
Maria Francesca Costabile & Antonio Piccinno &
IT University of Copenhagen, 2300, Copenhagen, Denmark
Yvonne Dittrich
University of Colorado at Boulder, 80309-0430, Boulder, CO, USA
Gerhard Fischer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulesza, T. et al. (2011). Where Are My Intelligent Assistant’s Mistakes? A Systematic Testing Approach. In: Costabile, M.F., Dittrich, Y., Fischer, G., Piccinno, A. (eds) End-User Development. IS-EUD 2011. Lecture Notes in Computer Science, vol 6654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21530-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-21530-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21529-2
Online ISBN: 978-3-642-21530-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Where Are My Intelligent Assistant’s Mistakes? A Systematic Testing Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Ingredients for Responsible Machine Learning: A Commented Review of The Hitchhiker’s Guide to Responsible Machine Learning

Towards Dependable and Explainable Machine Learning Using Automated Reasoning

Lessons for artificial intelligence from the study of natural stupidity

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Where Are My Intelligent Assistant’s Mistakes? A Systematic Testing Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Ingredients for Responsible Machine Learning: A Commented Review of The Hitchhiker’s Guide to Responsible Machine Learning

Towards Dependable and Explainable Machine Learning Using Automated Reasoning

Lessons for artificial intelligence from the study of natural stupidity

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation