ABSTRACT
This study provides insights into the use of conversational AI, particularly ChatGPT, in household appliance evaluation interviews and how it differs from real user behaviour. Three comparison experiments (real researcher-real user, real researcher-simulated user vs. simulated researcher and simulated user) reveal the differences in the responses of ChatGPT simulated and real users in specific evaluation scenarios, especially in the evaluation of product appearance, GUI, and PUI. The study found that although simulated users agreed with real users in evaluating the core features of smart appliances, there were limitations in certain practical experience aspects and significant differences in SUS, learning ability, and usability scores across experimental settings. The study also explores the advantages and disadvantages of incorporating simulated users into the product evaluation process, concluding that this introduces an innovative approach to product evaluation that, although challenging, demonstrates the great potential of simulated users in future product evaluation.
- AN Averkin and SA Yarushev. 2021. Review of research in the field of developing methods to extract rules from artificial neural networks. Journal of Computer and Systems Sciences International 60 (2021), 966–980.Google ScholarDigital Library
- David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Available at SSRN 4337484 (2023).Google Scholar
- M Barandas, H Gamboa, and JM Fonseca. 2015. A real time biofeedback system using visual user interface for physical rehabilitation. Procedia Manufacturing 3 (2015), 823–828.Google ScholarCross Ref
- Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.Google ScholarDigital Library
- John Brooke. 1996. Sus: a “quick and dirty’usability. Usability evaluation in industry 189, 3 (1996), 189–194.Google Scholar
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).Google Scholar
- Md Naseef-Ur-Rahman Chowdhury and Ahshanul Haque. 2023. ChatGPT: Its Applications and Limitations. In 2023 3rd International Conference on Intelligent Technologies (CONIT). IEEE, 1–7.Google Scholar
- Rushabh Doshi, Kanhai Amin, Pavan Khosla, Simar Bajaj, Sophie Chheang, and Howard P Forman. 2023. Utilizing Large Language Models to Simplify Radiology Reports: A Comparative Analysis of ChatGPT3. 5, ChatGPT4. 0, Google Bard, and Microsoft Bing. medRxiv (2023), 2023–06.Google Scholar
- Mirza Niaz Zaman Elin. [n. d.]. Comparative Analysis of Decision-Making Efficiency of Large Language Models. IJFMR-International Journal For Multidisciplinary Research 5, 3 ([n. d.]).Google Scholar
- Andrew J Flanagin, Miriam J Metzger, Rebekah Pure, Alex Markov, and Ethan Hartsell. 2014. Mitigating risk in ecommerce transactions: perceptions of information credibility and the role of user-generated ratings in product quality and purchase intention. Electronic Commerce Research 14 (2014), 1–23.Google ScholarDigital Library
- A Shaji George and AS Hovan George. 2023. A review of ChatGPT AI’s impact on several business sectors. Partners Universal International Innovation Journal 1, 1 (2023), 9–23.Google Scholar
- John Giorgi, Augustin Toma, Ronald Xie, Sondra Chen, Kevin An, Grace Zheng, and Bo Wang. 2023. WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models. In Proceedings of the 5th Clinical Natural Language Processing Workshop. 323–334.Google ScholarCross Ref
- John Giorgi, Augustin Toma, Ronald Xie, Sondra Chen, Kevin R An, Grace X Zheng, and Bo Wang. 2023. Clinical Note Generation from Doctor-Patient Conversations using Large Language Models: Insights from MEDIQA-Chat. arXiv preprint arXiv:2305.02220 (2023).Google Scholar
- Walid Hariri. 2023. Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing. arXiv preprint arXiv:2304.02017 (2023).Google Scholar
- Wilson Cheong Hin Hong. 2023. The impact of ChatGPT on foreign language teaching and learning: opportunities in education and research. Journal of Educational Technology and Innovation 5, 1 (2023).Google Scholar
- Farid Huseynov. 2023. Chatbots in Digital Marketing: Enhanced Customer Experience and Reduced Customer Service Costs. In Contemporary Approaches of Digital Marketing and the Role of Machine Intelligence. IGI Global, 46–72.Google Scholar
- Deepak Kapgate. 2022. Efficient quadcopter flight control using hybrid SSVEP+ P300 visual brain computer interface. International Journal of Human–Computer Interaction 38, 1 (2022), 42–52.Google ScholarCross Ref
- Turgut Karakose, Murat Demirkol, Ramazan Yirci, Hakan Polat, Tuncay Yavuz Ozdemir, and Tijen Tülübaş. 2023. A Conversation with ChatGPT about Digital Leadership and Technology Integration: Comparative Analysis Based on Human–AI Collaboration. Administrative Sciences 13, 7 (2023), 157.Google ScholarCross Ref
- Brady D Lund, Ting Wang, Nishith Reddy Mannuru, Bing Nie, Somipam Shimray, and Ziang Wang. 2023. ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology 74, 5 (2023), 570–581.Google ScholarCross Ref
- Ana Isabel Martins, Ana Filipa Rosa, Alexandra Queirós, Anabela Silva, and Nelson Pacheco Rocha. 2015. European Portuguese validation of the system usability scale (SUS). Procedia computer science 67 (2015), 293–300.Google Scholar
- Gioacchino Mauro, Harold Thimbleby, Andrea Domenici, and Cinzia Bernardeschi. 2017. Extending a user interface prototyping tool with automatic MISRA C code generation. arXiv preprint arXiv:1701.08468 (2017).Google Scholar
- Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020).Google Scholar
- Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.Google ScholarDigital Library
- Matthew Rueben, Frank J Bernieri, Cindy M Grimm, and William D Smart. 2016. User feedback on physical marker interfaces for protecting visual privacy from mobile robots. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 507–508.Google ScholarCross Ref
- J Sandlin. 2022. ChatGPT arrives in the academic world. Boing Boing (2022).Google Scholar
- R Santhosh, M Abinaya, V Anusuya, and D Gowthami. 2023. ChatGPT: Opportunities, Features and Future Prospects. In 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 1614–1622.Google Scholar
- Shailja Thakur, Baleegh Ahmad, Zhenxing Fan, Hammond Pearce, Benjamin Tan, Ramesh Karri, Brendan Dolan-Gavitt, and Siddharth Garg. 2023. Benchmarking Large Language Models for Automated Verilog RTL Code Generation. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6.Google Scholar
- Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. 2023. Large language models in medicine. Nature Medicine (2023), 1–11.Google Scholar
- Krzysztof Wach, Cong Doanh Duong, Joanna Ejdys, Rūta Kazlauskaitė, Pawel Korzynski, Grzegorz Mazurek, Joanna Paliszkiewicz, and Ewa Ziemba. 2023. The dark side of generative artificial intelligence: A critical analysis of controversies and risks of ChatGPT. Entrepreneurial Business and Economics Review 11, 2 (2023), 7–24.Google ScholarCross Ref
- Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, and Qun Liu. 2023. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966 (2023).Google Scholar
- Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.Google ScholarDigital Library
- Cheng Yang, Lingang Wu, Kun Tan, Chunyang Yu, Yuliang Zhou, Ye Tao, and Yu Song. 2021. Online user review analysis for product evaluation and improvement. Journal of Theoretical and Applied Electronic Commerce Research 16, 5 (2021), 1598–1611.Google ScholarCross Ref
- Jingye Yang, Cong Liu, Wendy Deng, Da Wu, Chunhua Weng, Yunyun Zhou, and Kai Wang. 2023. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. arXiv preprint arXiv:2308.06294 (2023).Google Scholar
- Shu-Yu Yeh. 2010. Involving consumers in product design through collaboration: the case of online role-playing games. Cyberpsychology, Behavior, and Social Networking 13, 6 (2010), 601–610.Google ScholarCross Ref
- Pengyuan Zhou. 2023. Unleasing chatgpt on the metaverse: Savior or destroyer?arXiv preprint arXiv:2303.13856 (2023).Google Scholar
- Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867 (2023).Google Scholar
- Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. Red teaming ChatGPT via jailbreaking: Bias, robustness, reliability and toxicity. arXiv preprint arXiv:2301.12867 (2023), 12–2.Google Scholar
Index Terms
- From Lab to Virtual: Comparing Real and AI-Generated User Interviews in Home Appliance Evaluation.
Recommendations
User interviews revisited: identifying user positions and system interpretations
NordiCHI '12: Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through DesignThroughout the history of HCI, interviews have been utilized for collecting users' subjective evaluations of interactive technology. This paper raises the issue that these interviews are often deployed in a manner overlooking two aspects of evaluation: ...
Towards user identification in the home from appliance usage patterns
UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publicationWe explore the feasibility of identifying users from the unique patterns they exhibit when interacting with an individual electrical appliance in the home. We evaluate the effectiveness of a supervised learning based approach for user identification ...
Design and evaluation of smart home user interface: effects of age, tasks and intelligence level
Smart homes are expected to promote productivity and enhance living experience, especially for old adults. To achieve this, the level of user interface intelligence should be designed to meet the needs of users and tasks. The aim of this study was to ...
Comments