research-article

From Lab to Virtual: Comparing Real and AI-Generated User Interviews in Home Appliance Evaluation.

Authors:
Minghui Liu

Academy of Arts & Design, Tsinghua University, China and The Future Laboratory, Tsinghua University, China

Academy of Arts & Design, Tsinghua University, China and The Future Laboratory, Tsinghua University, China

0000-0002-6136-2094
View Profile

,
Cheng Xue

Academy of Arts & Design, Tsinghua University, China and The Future Laboratory, Tsinghua University, China

Academy of Arts & Design, Tsinghua University, China and The Future Laboratory, Tsinghua University, China

0000-0001-7917-5521
View Profile

,
Yuxiang Zhai

Academy of Arts & Design, Tsinghua University, China and The Future Laboratory, Tsinghua University, China

Academy of Arts & Design, Tsinghua University, China and The Future Laboratory, Tsinghua University, China

0009-0002-4189-6141
View Profile

CHCHI '23: Proceedings of the Eleventh International Symposium of Chinese CHINovember 2023Pages 571–581https://doi.org/10.1145/3629606.3629672

Published:27 February 2024Publication History

CHCHI '23: Proceedings of the Eleventh International Symposium of Chinese CHI

Pages 571–581

ABSTRACT

This study provides insights into the use of conversational AI, particularly ChatGPT, in household appliance evaluation interviews and how it differs from real user behaviour. Three comparison experiments (real researcher-real user, real researcher-simulated user vs. simulated researcher and simulated user) reveal the differences in the responses of ChatGPT simulated and real users in specific evaluation scenarios, especially in the evaluation of product appearance, GUI, and PUI. The study found that although simulated users agreed with real users in evaluating the core features of smart appliances, there were limitations in certain practical experience aspects and significant differences in SUS, learning ability, and usability scores across experimental settings. The study also explores the advantages and disadvantages of incorporating simulated users into the product evaluation process, concluding that this introduces an innovative approach to product evaluation that, although challenging, demonstrates the great potential of simulated users in future product evaluation.

References

AN Averkin and SA Yarushev. 2021. Review of research in the field of developing methods to extract rules from artificial neural networks. Journal of Computer and Systems Sciences International 60 (2021), 966–980.Google ScholarDigital Library
David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Available at SSRN 4337484 (2023).Google Scholar
M Barandas, H Gamboa, and JM Fonseca. 2015. A real time biofeedback system using visual user interface for physical rehabilitation. Procedia Manufacturing 3 (2015), 823–828.Google ScholarCross Ref
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.Google ScholarDigital Library
John Brooke. 1996. Sus: a “quick and dirty’usability. Usability evaluation in industry 189, 3 (1996), 189–194.Google Scholar
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).Google Scholar
Md Naseef-Ur-Rahman Chowdhury and Ahshanul Haque. 2023. ChatGPT: Its Applications and Limitations. In 2023 3rd International Conference on Intelligent Technologies (CONIT). IEEE, 1–7.Google Scholar
Rushabh Doshi, Kanhai Amin, Pavan Khosla, Simar Bajaj, Sophie Chheang, and Howard P Forman. 2023. Utilizing Large Language Models to Simplify Radiology Reports: A Comparative Analysis of ChatGPT3. 5, ChatGPT4. 0, Google Bard, and Microsoft Bing. medRxiv (2023), 2023–06.Google Scholar
Mirza Niaz Zaman Elin. [n. d.]. Comparative Analysis of Decision-Making Efficiency of Large Language Models. IJFMR-International Journal For Multidisciplinary Research 5, 3 ([n. d.]).Google Scholar
Andrew J Flanagin, Miriam J Metzger, Rebekah Pure, Alex Markov, and Ethan Hartsell. 2014. Mitigating risk in ecommerce transactions: perceptions of information credibility and the role of user-generated ratings in product quality and purchase intention. Electronic Commerce Research 14 (2014), 1–23.Google ScholarDigital Library
A Shaji George and AS Hovan George. 2023. A review of ChatGPT AI’s impact on several business sectors. Partners Universal International Innovation Journal 1, 1 (2023), 9–23.Google Scholar
John Giorgi, Augustin Toma, Ronald Xie, Sondra Chen, Kevin An, Grace Zheng, and Bo Wang. 2023. WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models. In Proceedings of the 5th Clinical Natural Language Processing Workshop. 323–334.Google ScholarCross Ref
John Giorgi, Augustin Toma, Ronald Xie, Sondra Chen, Kevin R An, Grace X Zheng, and Bo Wang. 2023. Clinical Note Generation from Doctor-Patient Conversations using Large Language Models: Insights from MEDIQA-Chat. arXiv preprint arXiv:2305.02220 (2023).Google Scholar
Walid Hariri. 2023. Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing. arXiv preprint arXiv:2304.02017 (2023).Google Scholar
Wilson Cheong Hin Hong. 2023. The impact of ChatGPT on foreign language teaching and learning: opportunities in education and research. Journal of Educational Technology and Innovation 5, 1 (2023).Google Scholar
Farid Huseynov. 2023. Chatbots in Digital Marketing: Enhanced Customer Experience and Reduced Customer Service Costs. In Contemporary Approaches of Digital Marketing and the Role of Machine Intelligence. IGI Global, 46–72.Google Scholar
Deepak Kapgate. 2022. Efficient quadcopter flight control using hybrid SSVEP+ P300 visual brain computer interface. International Journal of Human–Computer Interaction 38, 1 (2022), 42–52.Google ScholarCross Ref
Turgut Karakose, Murat Demirkol, Ramazan Yirci, Hakan Polat, Tuncay Yavuz Ozdemir, and Tijen Tülübaş. 2023. A Conversation with ChatGPT about Digital Leadership and Technology Integration: Comparative Analysis Based on Human–AI Collaboration. Administrative Sciences 13, 7 (2023), 157.Google ScholarCross Ref
Brady D Lund, Ting Wang, Nishith Reddy Mannuru, Bing Nie, Somipam Shimray, and Ziang Wang. 2023. ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology 74, 5 (2023), 570–581.Google ScholarCross Ref
Ana Isabel Martins, Ana Filipa Rosa, Alexandra Queirós, Anabela Silva, and Nelson Pacheco Rocha. 2015. European Portuguese validation of the system usability scale (SUS). Procedia computer science 67 (2015), 293–300.Google Scholar
Gioacchino Mauro, Harold Thimbleby, Andrea Domenici, and Cinzia Bernardeschi. 2017. Extending a user interface prototyping tool with automatic MISRA C code generation. arXiv preprint arXiv:1701.08468 (2017).Google Scholar
Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020).Google Scholar
Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.Google ScholarDigital Library
Matthew Rueben, Frank J Bernieri, Cindy M Grimm, and William D Smart. 2016. User feedback on physical marker interfaces for protecting visual privacy from mobile robots. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 507–508.Google ScholarCross Ref
J Sandlin. 2022. ChatGPT arrives in the academic world. Boing Boing (2022).Google Scholar
R Santhosh, M Abinaya, V Anusuya, and D Gowthami. 2023. ChatGPT: Opportunities, Features and Future Prospects. In 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 1614–1622.Google Scholar
Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, Hammond Pearce, Benjamin Tan, Ramesh Karri, Brendan Dolan-Gavitt, and Siddharth Garg. 2023. Benchmarking Large Language Models for Automated Verilog RTL Code Generation. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6.Google Scholar
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. 2023. Large language models in medicine. Nature Medicine (2023), 1–11.Google Scholar
Krzysztof Wach, Cong Doanh Duong, Joanna Ejdys, Rūta Kazlauskaitė, Pawel Korzynski, Grzegorz Mazurek, Joanna Paliszkiewicz, and Ewa Ziemba. 2023. The dark side of generative artificial intelligence: A critical analysis of controversies and risks of ChatGPT. Entrepreneurial Business and Economics Review 11, 2 (2023), 7–24.Google ScholarCross Ref
Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, and Qun Liu. 2023. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966 (2023).Google Scholar
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.Google ScholarDigital Library
Cheng Yang, Lingang Wu, Kun Tan, Chunyang Yu, Yuliang Zhou, Ye Tao, and Yu Song. 2021. Online user review analysis for product evaluation and improvement. Journal of Theoretical and Applied Electronic Commerce Research 16, 5 (2021), 1598–1611.Google ScholarCross Ref
Jingye Yang, Cong Liu, Wendy Deng, Da Wu, Chunhua Weng, Yunyun Zhou, and Kai Wang. 2023. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. arXiv preprint arXiv:2308.06294 (2023).Google Scholar
Shu-Yu Yeh. 2010. Involving consumers in product design through collaboration: the case of online role-playing games. Cyberpsychology, Behavior, and Social Networking 13, 6 (2010), 601–610.Google ScholarCross Ref
Pengyuan Zhou. 2023. Unleasing chatgpt on the metaverse: Savior or destroyer?arXiv preprint arXiv:2303.13856 (2023).Google Scholar
Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867 (2023).Google Scholar
Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. Red teaming ChatGPT via jailbreaking: Bias, robustness, reliability and toxicity. arXiv preprint arXiv:2301.12867 (2023), 12–2.Google Scholar

Index Terms

From Lab to Virtual: Comparing Real and AI-Generated User Interviews in Home Appliance Evaluation.
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies

Recommendations

User interviews revisited: identifying user positions and system interpretations
NordiCHI '12: Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design

Throughout the history of HCI, interviews have been utilized for collecting users' subjective evaluations of interactive technology. This paper raises the issue that these interviews are often deployed in a manner overlooking two aspects of evaluation: ...
Read More
Towards user identification in the home from appliance usage patterns
UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

We explore the feasibility of identifying users from the unique patterns they exhibit when interacting with an individual electrical appliance in the home. We evaluate the effectiveness of a supervised learning based approach for user identification ...
Read More
Design and evaluation of smart home user interface: effects of age, tasks and intelligence level

Smart homes are expected to promote productivity and enhance living experience, especially for old adults. To achieve this, the level of user interface intelligence should be designed to meet the needs of users and tasks. The aim of this study was to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CHCHI '23: Proceedings of the Eleventh International Symposium of Chinese CHI
November 2023
634 pages
ISBN:9798400716454
DOI:10.1145/3629606

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Artificial Intelligence
Large Language Model
User Study
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate17of40submissions,43%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 31
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

From Lab to Virtual: Comparing Real and AI-Generated User Interviews in Home Appliance Evaluation.

CHCHI '23: Proceedings of the Eleventh International Symposium of Chinese CHI

ABSTRACT

References

Cited By

Index Terms

Recommendations

User interviews revisited: identifying user positions and system interpretations

Towards user identification in the home from appliance usage patterns

Design and evaluation of smart home user interface: effects of age, tasks and intelligence level

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

From Lab to Virtual: Comparing Real and AI-Generated User Interviews in Home Appliance Evaluation.

CHCHI '23: Proceedings of the Eleventh International Symposium of Chinese CHI

ABSTRACT

References

Cited By

Index Terms

Recommendations

User interviews revisited: identifying user positions and system interpretations

Towards user identification in the home from appliance usage patterns

Design and evaluation of smart home user interface: effects of age, tasks and intelligence level

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media