skip to main content
research-article

Automatic Detection of Usability Problem Encounters in Think-aloud Sessions

Published: 30 May 2020 Publication History

Abstract

Think-aloud protocols are a highly valued usability testing method for identifying usability problems. Despite the value of conducting think-aloud usability test sessions, analyzing think-aloud sessions is often time-consuming and labor-intensive. Consequently, previous research has urged the community to develop techniques to support fast-paced analysis. In this work, we took the first step to design and evaluate machine learning (ML) models to automatically detect usability problem encounters based on users’ verbalization and speech features in think-aloud sessions. Inspired by recent research that shows subtle patterns in users’ verbalizations and speech features tend to occur when they encounter problems, we examined whether these patterns can be utilized to improve the automatic detection of usability problems. We first conducted and recorded think-aloud sessions and then examined the effect of different input features, ML models, test products, and users on usability problem encounters detection. Our work uncovers several technical and user interface design challenges and sets a baseline for automating usability problem detection and integrating such automation into UX practitioners’ workflow.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), 265--283.
[2]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Mag. 35, 4 (2014), 105.
[3]
Morten Sieker Andreasen, Henrik Villemann Nielsen, Simon Ormholt Schrøder, and Jan Stage. 2007. What happened to remote usability testing?: An empirical study of three methods. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1405--1414.
[4]
Martin Blanchard, Nathaniel |D'Mello, Sidney |Olney, Andrew M.|Nystrand. 2015. Automatic classification of question 8 answer discourse segments from teacher's speech in classrooms. Int. Educ. Data Min. Soc. (2015). Retrieved from https://eric.ed.gov/?id=ED560555.
[5]
Liora Bresler, Judy Davidson Wasser, Nancy B. Hertzog, and Mary Lemons. 1996. Beyond the lone ranger researcher: Team work in qualitative research. Res. Stud. Music Educ. 7, 1 (1996), 13--27.
[6]
Anders Bruun, Peter Gull, Lene Hofmeister, and Jan Stage. 2009. Let your users do the testing: A comparison of three remote asynchronous usability testing methods. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI’09). 1619--1628.
[7]
Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 18.
[8]
Kapil Chalil Madathil and Joel S. Greenstein. 2011. Synchronous remote usability testing: A new approach facilitated by virtual worlds. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’11). 2225--2234.
[9]
Elizabeth Charters. 2003. The use of think-aloud methods in qualitative research an introduction to think-aloud methods. Brock Educ. J. 12, 2 (2003), 68--82.
[10]
Nan-Chen Chen, Margaret Drouhard, Rafal Kocielnik, Jina Suh, and Cecilia R. Aragon. 2018. Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8, 2 (2018). 9:1--9:20.
[11]
Parmit K. Chilana, Jacob O. Wobbrock, and Andrew J. Ko. 2010. Understanding usability practices in complex domains. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2337--2346.
[12]
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. (2014). Retrieved from http://arxiv.org/abs/1406.1078.
[13]
Torkil Clemmensen, Qingxin Shi, Jyoti Kumar, Huiyang Li, Xianghong Sun, and Pradeep Yammiyavar. 2007. Cultural usability tests—How usability tests are not the same all over the world. In Usability and Internationalization. HCI and Culture. Springer Berlin, 281--290.
[14]
Lynne Cooke. 2010. Assessing concurrent think-aloud protocol as a usability test method: A technical communication approach. IEEE Trans. Prof. Commun. 53, 3 (2010), 202--215.
[15]
Kevin Crowston, Eileen E. Allen, and Robert Heckman. 2012. Using natural language processing technology for qualitative data analysis. Int. J. Soc. Res. Methodol. 15, 6 (2012), 523--543.
[16]
I. Dey. 1993. Qualitative Data Analysis: A User-Friendly Guide for Social Scientists. Routledge.
[17]
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. Springer, Berlin, 1--15.
[18]
Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX design innovation: Challenges for working with machine learning as a design material. In Proceedings of the Chi Conference on Human Factors in Computing Systems. 278--288.
[19]
Margaret Drouhard, Nan Chen Chen, Jina Suh, Rafal Kocielnik, Vanessa Pena-Araya, Keting Cen, Xiangyi Zheng, and Cecilia R. Aragon. 2017. Aeonium: Visual analytics to support collaborative qualitative coding. In Proceedings of the IEEE Pacific Visualization Symposium. 220--229.
[20]
Upol Ehsan, Pradyumna Tambwekar, Larry Chan, Brent Harrison, and Mark O. Riedl. 2019. Automated rationale generation: A technique for explainable AI and its effects on human perceptions. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI’19). 263--274.
[21]
Elling Sanne, Lentz Leo, and Menno De Jong. 2012. Combining concurrent think-aloud protocols and eye-tracking observations: An analysis of verbalizations. IEEE Trans. Prof. Commun. 55, 3 (2012), 206--220.
[22]
K. Anders Ericsson and Herbert A. Simon. 1984. Protocol Analysis: Verbal Reports as Data. The MIT Press, Cambridge, MA.
[23]
Mingming Fan, Jinglan Lin, Christina Chung, and Khai N. Truong. 2019. Concurrent think-aloud verbalizations and usability problems. ACM Trans. Comput. Interact. 26, 5 (2019), 1--35.
[24]
Asbjørn Følstad, Effie Law, and Kasper Hornbæk. 2012. Analysis in practical usability evaluation: A survey study. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2127--2136.
[25]
Mark C. Fox, K. Anders Ericsson, and Ryan Best. 2011. Do procedures for verbal reporting of thinking have to be reactive? A meta-analysis and recommendations for best reporting methods. Psychol. Bull. 137, 2 (2011), 316.
[26]
Palash Goyal, Sumit Pandey, Karan Jain, Palash Goyal, Sumit Pandey, and Karan Jain. 2018. Research paper implementation: Sentiment classification. In Deep Learning for Natural Language Processing. Apress, 231--268.
[27]
Jonathan Grizou, I. Iturrate, Luis Montesano, Pierre-Yves Oudeyer, and Manuel Lopes. 2014. Interactive learning from unlabeled instructions. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI’14). Retrieved from http://auai.org/uai2014/proceedings/individuals/198.pdf.
[28]
Jan Gulliksen, Inger Boivie, Jenny Persson, Anders Hektor, and Lena Herulf. 2004. Making a difference: A survey of the usability profession in Sweden. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction (NordiCHI ’04). 207--215.
[29]
Morten Hertzum, Pia Borlund, and Kristina B. Kristoffersen. 2015. What do thinking-aloud participants say? A comparison of moderated and unmoderated usability sessions. Int. J. Hum. Comput. Interact. 31, 9 (2015), 557--570.
[30]
Morten Hertzum and Kristin Due Holmegaard. 2013. Thinking aloud in the presence of interruptions and time constraints. Int. J. Hum. Comput. Interact. 29, 5 (2013), 351--364.
[31]
Morten Hertzum and Niels Ebbe Jacobsen. 2001. The evaluator effect: A chilling fact about usability evaluation methods. Int. J. Hum. Comput. Interact. 13, 4 (2001), 421--443.
[32]
Masahiro Hori, Yasunori Kihara, and Takashi Kato. 2011. Investigation of indirect oral operation method for think aloud usability testing. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 38--46.
[33]
Paula Jarzabkowski, Rebecca Bednarek, and Laure Cabantous. 2015. Conducting global team-based ethnography: Methodological challenges and practical methods. Hum. Relations 68, 1 (2015), 3--33.
[34]
Claire-Marie Karat, Robert Campbell, and Tarra Fiegel. 1992. Comparison of empirical testing and walkthrough methods in user interface evaluation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’92). 397--404.
[35]
Jesper Kjeldskov, Mikael B. Skov, and Jan Stage. 2004. Instant data analysis: Conducting usability evaluations in a day. In Proceedings of the 3rd Nordic Conference on Human-Computer Interaction (NordiCHI’04). 233--240.
[36]
Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will you accept an imperfect AI? exploring designs for adjusting end-user expectations of AI systems. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). 1--14.
[37]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial IntellIgence. 333, (2015), 2267--2273.
[38]
Megh Marathe and Kentaro Toyama. 2018. Semi-automated coding for qualitative research. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). 1--12.
[39]
Sharon McDonald, Helen M. Edwards, and Tingting Zhao. 2012. Exploring think-alouds in usability testing: An international survey. IEEE Trans. Prof. Commun. 55, 1 (2012), 2--19.
[40]
Sharon McDonald and Helen Petrie. 2013. The effect of global instructions on think-aloud testing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). 2941--2944.
[41]
David Meignan, Sigrid Knust, Jean-Marc Frayret, Gilles Pesant, and Nicolas Gaud. 2015. A review and taxonomy of interactive optimization methods in operations research. ACM Trans. Interact. Intell. Syst. 5, 3 (2015), 1--43.
[42]
Jakob Nielsen. 1993. Usability Engineering. Elsevier.
[43]
Mie Nørgaard and Kasper Hornbæk. 2006. What do usability evaluators do in practice? An explorative study of think-aloud testing. In Proceedings of the 6th ACM Conference on Designing Interactive Systems (DIS’06). 209.
[44]
Erica Olmsted-Hawala and Jennifer Romano Bergstrom. 2012. Think-aloud protocols: Does age make a difference? In Proceedings of the STC Technical Communication Summit.
[45]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825--2830.
[46]
Jonathon Read, Rebecca Dridan, Stephan Oepen, and Lars Jørgen Solberg. 2012. Sentence boundary detection: A long solved problem? In Proceedings of the International Conference on Computational Linguistics (COLING’12). 985--994.
[47]
Qingxin Shi. 2008. A field study of the relationship and communication between Chinese evaluators and users in thinking aloud usability tests. In Proceedings of the 5th Nordic Conference on Human-computer Interaction Building Bridges (NordiCHI’08). 344.
[48]
Andreas Sonderegger, Sven Schmutz, and Juergen Sauer. 2016. The influence of age in usability testing. Appl. Ergon. 52, (2016), 291--300.
[49]
Howard Tamler. 1998. How (much) to intervene in a usability testing session. Common Gr. 8, 3 (1998), 11--15.
[50]
Karel Vredenburg, Ji-Ye Mao, Paul W. Smith, and Tom Carey. 2002. A survey of user-centered design practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Changing Our World, Changing Ourselves (CHI’02). 471.
[51]
Zuowei Wang, Xingyu Pan, Kevin F. Miller, and Kai S. Cortina. 2014. Automatic classification of activities in classroom discourse. Comput. Educ. 78, (2014), 115--123.
[52]
Brad Wuetherick. 2010. Basics of qualitative research: Techniques and procedures for developing grounded theory. Can. J. Univ. Contin. Educ. 36, 2 (2010).
[53]
Jasy Liew Suet Yan, Nancy McCracken, and Kevin Crowston. 2014. Semi-automatic content analysis of qualitative data. In Proceedings of the iConference.
[54]
Jasy Liew Suet Yan, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. Optimizing features in active machine learning for complex qualitative content analysis. In Proceedings of the ACL Workshop on Language Technologies and Computational Social Science 56, Ml (2014), 44--48.
[55]
Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating how experienced UX designers effectively work with machine learning. In Proceedings of the Designing Interactive Systems Conference (DIS’18). 585--596.
[56]
Qian Yang, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning adaptive mobile experiences when wireframing. In Proceedings of the ACM Conference on Designing Interactive Systems. 565--576.
[57]
Tingting Zhao, Sharon McDonald, and Helen M. Edwards. 2014. The impact of two different think-aloud instructions in a usability test: A case of just following orders? Behav. Inf. Technol. 33, 2 (2014), 162--182.
[58]
Haiyi Zhu, Robert E. Kraut, Yi-Chia Wang, and Aniket Kittur. 2011. Identifying shared leadership in Wikipedia. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’11). 3431.
[59]
Christian's Python Library: A Python library for voice analysis. Retrieved from https://homepage.univie.ac.at/christian.herbst/python/namespacepraat_util.html.
[60]
Praat: Doing Phonetics by Computer. Retrieved from http://www.fon.hum.uva.nl/praat/.
[61]
Sound: To Pitch (ac)… Retrieved from http://www.fon.hum.uva.nl/praat/manual/Sound__To_Pitch__ac____.html.
[62]
tf.random.uniform | TensorFlow Core r2.0. Retrieved from https://www.tensorflow.org/api_docs/python/tf/random/uniform.

Cited By

View all
  • (2024)Exploring the Impact of Artificial Intelligence-Generated Content (AIGC) Tools on Social Dynamics in UX CollaborationProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660703(1594-1606)Online publication date: 1-Jul-2024
  • (2024)Enhancing UX Evaluation Through Collaboration with Conversational AI Assistants: Effects of Proactive Dialogue and TimingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642168(1-16)Online publication date: 11-May-2024
  • (2024)uxSense: Supporting User Experience Analysis with Visualization and Computer VisionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324158130:7(3841-3856)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. Automatic Detection of Usability Problem Encounters in Think-aloud Sessions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Interactive Intelligent Systems
      ACM Transactions on Interactive Intelligent Systems  Volume 10, Issue 2
      June 2020
      155 pages
      ISSN:2160-6455
      EISSN:2160-6463
      DOI:10.1145/3403610
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 May 2020
      Online AM: 07 May 2020
      Accepted: 01 February 2020
      Revised: 01 February 2020
      Received: 01 May 2019
      Published in TIIS Volume 10, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. AI-assisted UX analysis method
      2. Think aloud
      3. machine learning
      4. speech features
      5. usability problem
      6. user experience (UX)
      7. verbalization

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Department of Computer Science at the University of Toronto

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)119
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 03 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Exploring the Impact of Artificial Intelligence-Generated Content (AIGC) Tools on Social Dynamics in UX CollaborationProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660703(1594-1606)Online publication date: 1-Jul-2024
      • (2024)Enhancing UX Evaluation Through Collaboration with Conversational AI Assistants: Effects of Proactive Dialogue and TimingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642168(1-16)Online publication date: 11-May-2024
      • (2024)uxSense: Supporting User Experience Analysis with Visualization and Computer VisionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324158130:7(3841-3856)Online publication date: Jul-2024
      • (2024)Potential effectiveness and efficiency issues in usability evaluation within digital health: A systematic literature reviewJournal of Systems and Software10.1016/j.jss.2023.111881208(111881)Online publication date: Feb-2024
      • (2024)Designing a conversational agent for supporting data exploration in citizen scienceElectronic Markets10.1007/s12525-024-00705-334:1Online publication date: 27-Mar-2024
      • (2023)The Think-Aloud Method for Evaluating the Usability of a Regional AtlasISPRS International Journal of Geo-Information10.3390/ijgi1203009512:3(95)Online publication date: 26-Feb-2023
      • (2023)Crafting Human-AI Collaborative Analysis for User Experience EvaluationExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3577042(1-6)Online publication date: 19-Apr-2023
      • (2023)Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581247(1-15)Online publication date: 19-Apr-2023
      • (2023)Older Adults’ Concurrent and Retrospective Think-Aloud Verbalizations for Identifying User Experience Problems of VR GamesInteracting with Computers10.1093/iwc/iwac03934:4(99-115)Online publication date: 3-Jan-2023
      • (2023)Integrating user experience assessment in Re-CRUD console framework developmentWireless Networks10.1007/s11276-022-03098-329:1(109-127)Online publication date: 1-Jan-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media