research-article

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions

Authors:

Khai N. TruongAuthors Info & Claims

ACM Transactions on Interactive Intelligent Systems (TiiS), Volume 10, Issue 2

Article No.: 16, Pages 1 - 24

https://doi.org/10.1145/3385732

Published: 30 May 2020 Publication History

Abstract

Think-aloud protocols are a highly valued usability testing method for identifying usability problems. Despite the value of conducting think-aloud usability test sessions, analyzing think-aloud sessions is often time-consuming and labor-intensive. Consequently, previous research has urged the community to develop techniques to support fast-paced analysis. In this work, we took the first step to design and evaluate machine learning (ML) models to automatically detect usability problem encounters based on users’ verbalization and speech features in think-aloud sessions. Inspired by recent research that shows subtle patterns in users’ verbalizations and speech features tend to occur when they encounter problems, we examined whether these patterns can be utilized to improve the automatic detection of usability problems. We first conducted and recorded think-aloud sessions and then examined the effect of different input features, ML models, test products, and users on usability problem encounters detection. Our work uncovers several technical and user interface design challenges and sets a baseline for automating usability problem detection and integrating such automation into UX practitioners’ workflow.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), 265--283.

[2]

Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Mag. 35, 4 (2014), 105.

Digital Library

[3]

Morten Sieker Andreasen, Henrik Villemann Nielsen, Simon Ormholt Schrøder, and Jan Stage. 2007. What happened to remote usability testing?: An empirical study of three methods. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1405--1414.

Digital Library

[4]

Martin Blanchard, Nathaniel |D'Mello, Sidney |Olney, Andrew M.|Nystrand. 2015. Automatic classification of question 8 answer discourse segments from teacher's speech in classrooms. Int. Educ. Data Min. Soc. (2015). Retrieved from https://eric.ed.gov/?id=ED560555.

[5]

Liora Bresler, Judy Davidson Wasser, Nancy B. Hertzog, and Mary Lemons. 1996. Beyond the lone ranger researcher: Team work in qualitative research. Res. Stud. Music Educ. 7, 1 (1996), 13--27.

[6]

Anders Bruun, Peter Gull, Lene Hofmeister, and Jan Stage. 2009. Let your users do the testing: A comparison of three remote asynchronous usability testing methods. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI’09). 1619--1628.

Digital Library

[7]

Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 18.

Digital Library

[8]

Kapil Chalil Madathil and Joel S. Greenstein. 2011. Synchronous remote usability testing: A new approach facilitated by virtual worlds. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’11). 2225--2234.

[9]

Elizabeth Charters. 2003. The use of think-aloud methods in qualitative research an introduction to think-aloud methods. Brock Educ. J. 12, 2 (2003), 68--82.

[10]

Nan-Chen Chen, Margaret Drouhard, Rafal Kocielnik, Jina Suh, and Cecilia R. Aragon. 2018. Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8, 2 (2018). 9:1--9:20.

Digital Library

[11]

Parmit K. Chilana, Jacob O. Wobbrock, and Andrew J. Ko. 2010. Understanding usability practices in complex domains. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2337--2346.

[12]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. (2014). Retrieved from http://arxiv.org/abs/1406.1078.

[13]

Torkil Clemmensen, Qingxin Shi, Jyoti Kumar, Huiyang Li, Xianghong Sun, and Pradeep Yammiyavar. 2007. Cultural usability tests—How usability tests are not the same all over the world. In Usability and Internationalization. HCI and Culture. Springer Berlin, 281--290.

[14]

Lynne Cooke. 2010. Assessing concurrent think-aloud protocol as a usability test method: A technical communication approach. IEEE Trans. Prof. Commun. 53, 3 (2010), 202--215.

[15]

Kevin Crowston, Eileen E. Allen, and Robert Heckman. 2012. Using natural language processing technology for qualitative data analysis. Int. J. Soc. Res. Methodol. 15, 6 (2012), 523--543.

[16]

I. Dey. 1993. Qualitative Data Analysis: A User-Friendly Guide for Social Scientists. Routledge.

[17]

Thomas G. Dietterich. 2000. Ensemble methods in machine learning. Springer, Berlin, 1--15.

[18]

Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX design innovation: Challenges for working with machine learning as a design material. In Proceedings of the Chi Conference on Human Factors in Computing Systems. 278--288.

Digital Library

[19]

Margaret Drouhard, Nan Chen Chen, Jina Suh, Rafal Kocielnik, Vanessa Pena-Araya, Keting Cen, Xiangyi Zheng, and Cecilia R. Aragon. 2017. Aeonium: Visual analytics to support collaborative qualitative coding. In Proceedings of the IEEE Pacific Visualization Symposium. 220--229.

[20]

Upol Ehsan, Pradyumna Tambwekar, Larry Chan, Brent Harrison, and Mark O. Riedl. 2019. Automated rationale generation: A technique for explainable AI and its effects on human perceptions. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI’19). 263--274.

[21]

Elling Sanne, Lentz Leo, and Menno De Jong. 2012. Combining concurrent think-aloud protocols and eye-tracking observations: An analysis of verbalizations. IEEE Trans. Prof. Commun. 55, 3 (2012), 206--220.

[22]

K. Anders Ericsson and Herbert A. Simon. 1984. Protocol Analysis: Verbal Reports as Data. The MIT Press, Cambridge, MA.

[23]

Mingming Fan, Jinglan Lin, Christina Chung, and Khai N. Truong. 2019. Concurrent think-aloud verbalizations and usability problems. ACM Trans. Comput. Interact. 26, 5 (2019), 1--35.

Digital Library

[24]

Asbjørn Følstad, Effie Law, and Kasper Hornbæk. 2012. Analysis in practical usability evaluation: A survey study. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2127--2136.

Digital Library

[25]

Mark C. Fox, K. Anders Ericsson, and Ryan Best. 2011. Do procedures for verbal reporting of thinking have to be reactive? A meta-analysis and recommendations for best reporting methods. Psychol. Bull. 137, 2 (2011), 316.

[26]

Palash Goyal, Sumit Pandey, Karan Jain, Palash Goyal, Sumit Pandey, and Karan Jain. 2018. Research paper implementation: Sentiment classification. In Deep Learning for Natural Language Processing. Apress, 231--268.

[27]

Jonathan Grizou, I. Iturrate, Luis Montesano, Pierre-Yves Oudeyer, and Manuel Lopes. 2014. Interactive learning from unlabeled instructions. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI’14). Retrieved from http://auai.org/uai2014/proceedings/individuals/198.pdf.

[28]

Jan Gulliksen, Inger Boivie, Jenny Persson, Anders Hektor, and Lena Herulf. 2004. Making a difference: A survey of the usability profession in Sweden. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction (NordiCHI ’04). 207--215.

Digital Library

[29]

Morten Hertzum, Pia Borlund, and Kristina B. Kristoffersen. 2015. What do thinking-aloud participants say? A comparison of moderated and unmoderated usability sessions. Int. J. Hum. Comput. Interact. 31, 9 (2015), 557--570.

[30]

Morten Hertzum and Kristin Due Holmegaard. 2013. Thinking aloud in the presence of interruptions and time constraints. Int. J. Hum. Comput. Interact. 29, 5 (2013), 351--364.

[31]

Morten Hertzum and Niels Ebbe Jacobsen. 2001. The evaluator effect: A chilling fact about usability evaluation methods. Int. J. Hum. Comput. Interact. 13, 4 (2001), 421--443.

[32]

Masahiro Hori, Yasunori Kihara, and Takashi Kato. 2011. Investigation of indirect oral operation method for think aloud usability testing. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 38--46.

[33]

Paula Jarzabkowski, Rebecca Bednarek, and Laure Cabantous. 2015. Conducting global team-based ethnography: Methodological challenges and practical methods. Hum. Relations 68, 1 (2015), 3--33.

[34]

Claire-Marie Karat, Robert Campbell, and Tarra Fiegel. 1992. Comparison of empirical testing and walkthrough methods in user interface evaluation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’92). 397--404.

Digital Library

[35]

Jesper Kjeldskov, Mikael B. Skov, and Jan Stage. 2004. Instant data analysis: Conducting usability evaluations in a day. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction (NordiCHI’04). 233--240.

Digital Library

[36]

Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will you accept an imperfect AI? exploring designs for adjusting end-user expectations of AI systems. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). 1--14.

[37]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial IntellIgence. 333, (2015), 2267--2273.

[38]

Megh Marathe and Kentaro Toyama. 2018. Semi-automated coding for qualitative research. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). 1--12.

Digital Library

[39]

Sharon McDonald, Helen M. Edwards, and Tingting Zhao. 2012. Exploring think-alouds in usability testing: An international survey. IEEE Trans. Prof. Commun. 55, 1 (2012), 2--19.

[40]

Sharon McDonald and Helen Petrie. 2013. The effect of global instructions on think-aloud testing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). 2941--2944.

Digital Library

[41]

David Meignan, Sigrid Knust, Jean-Marc Frayret, Gilles Pesant, and Nicolas Gaud. 2015. A review and taxonomy of interactive optimization methods in operations research. ACM Trans. Interact. Intell. Syst. 5, 3 (2015), 1--43.

Digital Library

[42]

Jakob Nielsen. 1993. Usability Engineering. Elsevier.

Digital Library

[43]

Mie Nørgaard and Kasper Hornbæk. 2006. What do usability evaluators do in practice? An explorative study of think-aloud testing. In Proceedings of the 6th ACM Conference on Designing Interactive Systems (DIS’06). 209.

Digital Library

[44]

Erica Olmsted-Hawala and Jennifer Romano Bergstrom. 2012. Think-aloud protocols: Does age make a difference? In Proceedings of the STC Technical Communication Summit.

[45]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825--2830.

Digital Library

[46]

Jonathon Read, Rebecca Dridan, Stephan Oepen, and Lars Jørgen Solberg. 2012. Sentence boundary detection: A long solved problem? In Proceedings of the International Conference on Computational Linguistics (COLING’12). 985--994.

[47]

Qingxin Shi. 2008. A field study of the relationship and communication between Chinese evaluators and users in thinking aloud usability tests. In Proceedings of the 5th Nordic Conference on Human-computer Interaction Building Bridges (NordiCHI’08). 344.

Digital Library

[48]

Andreas Sonderegger, Sven Schmutz, and Juergen Sauer. 2016. The influence of age in usability testing. Appl. Ergon. 52, (2016), 291--300.

[49]

Howard Tamler. 1998. How (much) to intervene in a usability testing session. Common Gr. 8, 3 (1998), 11--15.

[50]

Karel Vredenburg, Ji-Ye Mao, Paul W. Smith, and Tom Carey. 2002. A survey of user-centered design practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Changing Our World, Changing Ourselves (CHI’02). 471.

Digital Library

[51]

Zuowei Wang, Xingyu Pan, Kevin F. Miller, and Kai S. Cortina. 2014. Automatic classification of activities in classroom discourse. Comput. Educ. 78, (2014), 115--123.

[52]

Brad Wuetherick. 2010. Basics of qualitative research: Techniques and procedures for developing grounded theory. Can. J. Univ. Contin. Educ. 36, 2 (2010).

[53]

Jasy Liew Suet Yan, Nancy McCracken, and Kevin Crowston. 2014. Semi-automatic content analysis of qualitative data. In Proceedings of the iConference.

[54]

Jasy Liew Suet Yan, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. Optimizing features in active machine learning for complex qualitative content analysis. In Proceedings of the ACL Workshop on Language Technologies and Computational Social Science 56, Ml (2014), 44--48.

[55]

Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating how experienced UX designers effectively work with machine learning. In Proceedings of the Designing Interactive Systems Conference (DIS’18). 585--596.

Digital Library

[56]

Qian Yang, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning adaptive mobile experiences when wireframing. In Proceedings of the ACM Conference on Designing Interactive Systems. 565--576.

Digital Library

[57]

Tingting Zhao, Sharon McDonald, and Helen M. Edwards. 2014. The impact of two different think-aloud instructions in a usability test: A case of just following orders? Behav. Inf. Technol. 33, 2 (2014), 162--182.

[58]

Haiyi Zhu, Robert E. Kraut, Yi-Chia Wang, and Aniket Kittur. 2011. Identifying shared leadership in Wikipedia. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’11). 3431.

Digital Library

[59]

Christian's Python Library: A Python library for voice analysis. Retrieved from https://homepage.univie.ac.at/christian.herbst/python/namespacepraat_util.html.

[60]

Praat: Doing Phonetics by Computer. Retrieved from http://www.fon.hum.uva.nl/praat/.

[61]

Sound: To Pitch (ac)… Retrieved from http://www.fon.hum.uva.nl/praat/manual/Sound__To_Pitch__ac____.html.

[62]

tf.random.uniform | TensorFlow Core r2.0. Retrieved from https://www.tensorflow.org/api_docs/python/tf/random/uniform.

Cited By

Wang ZShen LKuang EZhang SFan M(2024)Exploring the Impact of Artificial Intelligence-Generated Content (AIGC) Tools on Social Dynamics in UX CollaborationProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660703(1594-1606)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3660703
Kuang ELi MFan MShinohara K(2024)Enhancing UX Evaluation Through Collaboration with Conversational AI Assistants: Effects of Proactive Dialogue and TimingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642168(1-16)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642168
Batch AJi YFan MZhao JElmqvist N(2024)uxSense: Supporting User Experience Analysis with Visualization and Computer VisionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324158130:7(3841-3856)Online publication date: Jul-2024
https://doi.org/10.1109/TVCG.2023.3241581
Show More Cited By

Index Terms

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions
1. Human-centered computing
2. Information systems

Recommendations

Older Adults’ Think-Aloud Verbalizations and Speech Features for Identifying User Experience Problems
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Subtle patterns in users’ think-aloud (TA) verbalizations and speech features are shown to be telltale signs of User Experience (UX) problems. However, such patterns were uncovered among young adults. Whether such patterns apply for older adults remains ...
Concurrent Think-Aloud Verbalizations and Usability Problems

The concurrent think-aloud protocol—in which participants verbalize their thoughts when performing tasks—is a widely employed approach in usability testing. Despite its value, analyzing think-aloud sessions can be onerous because it often entails ...
Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

We describe an empirical, between-subjects study on the use of think-aloud protocols in usability testing of a federal data-dissemination Web site. This double-blind study used three different types of think-aloud protocols: a traditional protocol, a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Interactive Intelligent Systems

ACM Transactions on Interactive Intelligent Systems Volume 10, Issue 2

June 2020

155 pages

ISSN:2160-6455

EISSN:2160-6463

DOI:10.1145/3403610

Editor:
Michelle X. Zhou
Juji, Inc., USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2020

Online AM: 07 May 2020

Accepted: 01 February 2020

Revised: 01 February 2020

Received: 01 May 2019

Published in TIIS Volume 10, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Department of Computer Science at the University of Toronto

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
589
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)13

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZShen LKuang EZhang SFan M(2024)Exploring the Impact of Artificial Intelligence-Generated Content (AIGC) Tools on Social Dynamics in UX CollaborationProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660703(1594-1606)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3660703
Kuang ELi MFan MShinohara K(2024)Enhancing UX Evaluation Through Collaboration with Conversational AI Assistants: Effects of Proactive Dialogue and TimingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642168(1-16)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642168
Batch AJi YFan MZhao JElmqvist N(2024)uxSense: Supporting User Experience Analysis with Visualization and Computer VisionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324158130:7(3841-3856)Online publication date: Jul-2024
https://doi.org/10.1109/TVCG.2023.3241581
Maqbool BHerold S(2024)Potential effectiveness and efficiency issues in usability evaluation within digital health: A systematic literature reviewJournal of Systems and Software10.1016/j.jss.2023.111881208(111881)Online publication date: Feb-2024
https://doi.org/10.1016/j.jss.2023.111881
Stein CTeubner TMorana S(2024)Designing a conversational agent for supporting data exploration in citizen scienceElectronic Markets10.1007/s12525-024-00705-334:1Online publication date: 27-Mar-2024
https://doi.org/10.1007/s12525-024-00705-3
Vanicek TPopelka S(2023)The Think-Aloud Method for Evaluating the Usability of a Regional AtlasISPRS International Journal of Geo-Information10.3390/ijgi1203009512:3(95)Online publication date: 26-Feb-2023
https://doi.org/10.3390/ijgi12030095
Kuang E(2023)Crafting Human-AI Collaborative Analysis for User Experience EvaluationExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3577042(1-6)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3577042
Kuang EJahangirzadeh Soure EFan MZhao JShinohara K(2023)Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581247(1-15)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581247
Fan MTibdewal VZhao QCao LPeng CShu RShan Y(2023)Older Adults’ Concurrent and Retrospective Think-Aloud Verbalizations for Identifying User Experience Problems of VR GamesInteracting with Computers10.1093/iwc/iwac03934:4(99-115)Online publication date: 3-Jan-2023
https://doi.org/10.1093/iwc/iwac039
Anuar AAzmi AKama NRusli HBakar NMohamed N(2023)Integrating user experience assessment in Re-CRUD console framework developmentWireless Networks10.1007/s11276-022-03098-329:1(109-127)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s11276-022-03098-3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents