Skip to main content
Log in

Rationally or emotionally: how should voice user interfaces reply to users of different genders considering user experience?

  • Original Article
  • Published:
Cognition, Technology & Work Aims and scope Submit manuscript

Abstract

Voice user interfaces (VUIs) have exploded in popularity over the past 3 years. However, there has been little research on the reply methods that VUIs can adopt to communicate with people. In this paper, we designed 2 studies with 20 participants to explore the influence of reply methods on user experience in 2 kinds of scenarios (applicational scenarios and giving a command) when using a VUI. We explored the performance of different reply methods (fact-only, rational, and emotional) at different times and in different scenarios. In addition, we examined whether there were gender differences when evaluating a reply and different preferences for different reply methods. A “Wizard of Oz” method was used in the experiments to simulate real scenarios for communication between the participants and the VUI. We divided a reply into three parts (fact + judgment + strategy) and constructed three kinds of reply methods. In the experiments, we used quantitative scoring (five aspects: affection, confidence, naturalness, social distance, and satisfaction), preference selection and an interview to measure the participants’ user experience. The results indicated that the participants were inclined to prefer the reply methods (rational and emotional) that offered judgments and strategies in our experiment script, and the emotional style received the highest evaluation. In addition, we found that male participants tended to have a higher evaluation of VUIs’ replies for all three reply methods in applicational scenarios and when giving a command than female participants in our studies. In general, these results may contribute to the design of VUI replies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adiga N, Prasanna SRM (2019) Acoustic features modelling for statistical parametric speech synthesis: a review. IETE Technical Review 36(2):130–149. https://doi.org/10.1080/02564602.2018.1432422

    Article  Google Scholar 

  • Ameen N, Tarhini A, Reppel A, Anand A (2021) Customer experiences in the age of artificial intelligence. Computers in Human Behavior 114. https://doi.org/10.1016/j.chb.2020.106548

  • Becker C, Kopp S, Wachsmuth I (2007) Why Emotions should be integrated into conversational agents. In: Nishida T (ed) Wiley series in agent technology. Wiley, New York, pp 49–67. https://doi.org/10.1002/9780470512470.ch3

    Chapter  Google Scholar 

  • Bentley F, Luvogt C, Silverman M, Wirasinghe R, White B, Lottridge D (2018) Understanding the long-term use of smart speaker assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2(3):1–24. https://doi.org/10.1145/3264901

    Article  Google Scholar 

  • Bernhaupt R, Dalvi G, Joshi A, Balkrishan DK, O'Neill J, Winckler M (Eds) (2017) Human-Computer Interaction-INTERACT 2017: 16th IFIP TC 13 International Conference, Mumbai, India, September 25-29, 2017, Proceedings, Part II (Vol. 10514). Springer. https://doi.org/10.1007/978-3-319-67744-6

  • Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. Journal of behavior therapy and experimental psychiatry 25(1):49–59. https://doi.org/10.1016/0005-7916(94)90063-9

    Article  Google Scholar 

  • Brooke J (1996) SUS-A quick and dirty usability scale. Usability evaluation in industry 189(194):4–7

    Google Scholar 

  • Cohen MH, Cohen MH, Giangola JP, Balogh J (2004) Voice user interface design. Addison-Wesley Professional, San Francisco

    Google Scholar 

  • Coskun-Setirek A, Mardikyan S (2017) Understanding the adoption of voice activated personal assistants. In: Int J E-Services Mobile Appl (IJESMA) 9(3):1–21. https://doi.org/10.4018/IJESMA.2017070101

    Article  Google Scholar 

  • Dybala P, Ptaszynski M, Rzepka R, Araki K (2009, May) Humoroids: conversational agents that induce positive emotions with humor. In AAMAS'09 Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (Vol. 2, pp. 1171-1172). ACM, Budapest, Hungary

  • Eyssel F, De Ruiter L, Kuchenbrandt D, Bobinger S, Hegel F (2012, March) ‘If you sound like me, you must be more human’: On the interplay of robot and user features on human-robot acceptance and anthropomorphism. In: 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 125-126). IEEE, Boston, MA, USA. https://doi.org/10.1145/2157689.2157717

  • Fischer JE, Reeves S, Porcheron M, Sikveland R O (2019, August) Progressivity for voice interface design. In Proceedings of the 1st International Conference on Conversational User Interfaces (pp. 1-8). ACM, New York, NY, USA. https://doi.org/10.1145/3342775.3342788

  • Green P, Wei-Haas L (1985) The rapid development of user interfaces: Experience with the Wizard of Oz method. In Proceedings of the Human Factors Society Annual Meeting (Vol. 29, No. 5, pp. 470-474). Sage CA, Los Angeles. https://doi.org/10.1177/154193128502900515

  • Habler F, Peisker M, Henze N (2019) Differences between smart speakers and graphical user interfaces for music search considering gender effects. In Proceedings of the 18th International Conference on Mobile and Ubiquitous Multimedia (pp. 1-7). ACM, New York, NY, USA. https://doi.org/10.1145/3365610.3365627

  • Hone KS, Graham R (2000) Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Engineering, 6(3-4), 287-303. https://doi.org/10.1017/S1351324900002497

    Article  Google Scholar 

  • Jang Y (2020) Exploring User Interaction and Satisfaction with Virtual Personal Assistant Usage through Smart Speakers. Archives of Design Research, 33(3), 127-135. https://doi.org/10.15187/adr.2020.08.33.3.127

    Article  Google Scholar 

  • Biermann M, Schweiger E, Jentsch M (2019) Talking to stupid?!? improving voice user interfaces. Mensch und Computer 2019-Usability Professionals. https://doi.org/10.18420/MUC2019-UP-0253

  • Jeong Y, Lee J, Kang Y (2019) Exploring effects of conversational fillers on user perception of conversational agents. In Extended abstracts of the 2019 CHI conference on human factors in computing systems (pp. 1-6). ACM, Glasgow, UK. https://doi.org/10.1145/3290607.3312913

  • Karsenty L, Botherel V (2005) Transparency strategies to help users handle system errors. Speech Communication, 45(3), 305-324. https://doi.org/10.1016/j.specom.2004.10.018

    Article  Google Scholar 

  • Kerly A, Bull S (2006) The potential for chatbots in negotiated learner modelling: A wizard-of-oz study. In International Conference on Intelligent Tutoring Systems (pp. 443-452). Springer, Berlin, Heidelberg. https://doi.org/10.1007/11774303_44

    Chapter  Google Scholar 

  • Kim Y, Mutlu B (2014) How social distance shapes human–robot interaction. International Journal of Human-Computer Studies, 72(12), 783-795.https://doi.org/10.1016/j.ijhcs.2014.05.005

    Article  Google Scholar 

  • Klein AM, Hinderks A, Schrepp M, Thomaschewski J (2020) Measuring User Experience Quality of Voice Assistants Voice Communication Scales for the UEQ+ Framework: Voice Communication Scales for the UEQ+ Framework. In 2020 15th Iberian Conference on Information Systems and Technologies (CISTI) (pp. 1-4). IEEE, Seville, Spain. https://doi.org/10.23919/CISTI49556.2020.9140966

  • Kopp S, Gesellensetter L, Krämer NC, Wachsmuth I (2005) A conversational agent as museum guide–design and evaluation of a real-world application. In International workshop on intelligent virtual agents (pp. 329-343). Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550617_28

    Chapter  Google Scholar 

  • Krause AE, North AC (2017) Pleasure, arousal, dominance, and judgments about music in everyday life. Psychology of Music, 45(3), 355-374. https://doi.org/10.1177/0305735616664214

    Article  Google Scholar 

  • Lee S, Cho M, Lee S (2020) What If Conversational Agents Became Invisible? Comparing Users' Mental Models According to Physical Entity of AI Speaker. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(3), 1-24. Athens, Greece. https://doi.org/10.1145/3411840

    Article  Google Scholar 

  • Maharjan R, Bækgaard P, Bardram JE (2019) " Hear me out" smart speaker based conversational agent to monitor symptoms in mental health. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers (pp. 929-933). ACM, London, UK. https://doi.org/10.1145/3341162.3346270

  • McKeown G (2016) Laughter and humour as conversational mind-reading displays. In International Conference on Distributed, Ambient, and Pervasive Interactions (pp. 317-328). Springer, Cham. https://doi.org/10.1007/978-3-319-39862-4_29

    Chapter  Google Scholar 

  • Melton M, Fenwick Jr J (2019) Alexa Skill Voice Interface for the Moodle Learning Management System. J Comput Sci Coll, 26

  • Merrill DW, Reid RH (1981) Personal styles & effective performance. CRC Press, Boca Raton

    Book  Google Scholar 

  • Mehrabian A (1996) Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychol 14(4):261-292. https://doi.org/10.1007/BF02686918

    Article  MathSciNet  Google Scholar 

  • Miccoli L, Delgado R, Guerra P, Versace F, Rodríguez-Ruiz S, Fernández-Santaella MC (2016) Affective pictures and the open library of affective foods (OLAF): tools to investigate emotions toward food in adults. PLoS One 11(8):e0158991. https://doi.org/10.1371/journal.pone.0158991

    Article  Google Scholar 

  • Nguyen Q N, Ta A, Prybutok V (2019) An integrated model of voice-user interface continuance intention: the gender effect. Int J Hum Comput Interact 35(15):1362-1377.https://doi.org/10.1080/10447318.2018.1525023

    Article  Google Scholar 

  • Niculescu AI, Banchs RE (2019) Humor intelligence for virtual agents. In 9th International Workshop on Spoken Dialogue System Technology (pp. 285-297). Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_25

    Chapter  Google Scholar 

  • Norton RW (1978) Foundation of a communicator style construct. Hum Commun Res 4(2):99-112.

    Article  Google Scholar 

  • Park S, Lee Y (2020) User Experience of Smart Speaker Visual Feedback Type: The Moderating Effect of Need for Cognition and Multitasking. Archives of Design Research, 33(2):181-199.https://doi.org/10.15187/adr.2020.05.33.2.181

    Article  Google Scholar 

  • Pearl C (2016) Designing voice user interfaces: principles of conversational experiences. O’Reilly Media Inc, Newton

    Google Scholar 

  • Pigliacelli F (2020) Smart speakers’ adoption: technology acceptance model and the role of conversational style. [Unpublished master dissertation]. Libera universtà Internazionale degli Studi Sociali

  • Polkosky MD, Lewis JR (2003) Expanding the MOS: Development and psychometric evaluation of the MOS-R and MOS-X. International Journal of Speech Technology 6(2):161–182

    Article  Google Scholar 

  • Radziwill NM, Benton MC (2017) Evaluating quality of chatbots and intelligent conversational agents. arXiv preprint arXiv:1704.04579

  • Schwind V, Henze N (2018) Gender-and age-related differences in designing the characteristics of stereotypical virtual faces. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (pp. 463-475). ACM, New York, NY, USA. https://doi.org/10.1145/3242671.3242692

  • Schwind V, Knierim P, Tasci C, Franczak P, Haas N, Henze N (2017) " These are not my hands!" Effect of Gender on the Perception of Avatar Hands in Virtual Reality. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1577-1582). ACM, New York, NY, USA. https://doi.org/10.1145/3025453.3025602

  • Shamekhi A, Czerwinski M, Mark G, Novotny M, Bennett G A (2016) An exploratory study toward the preferred conversational style for compatible virtual agents. In International Conference on Intelligent Virtual Agents (pp. 40-50). Springer, Cham. https://doi.org/10.1007/978-3-319-47665-0_4

  • Street Jr RL (1982) Evaluation of noncontent speech accommodation. Language & Communication, 2(1): 13-31. https://doi.org/10.1016/0271-5309(82)90032-5

    Article  Google Scholar 

  • Vanderhaegen F (2021) Weak Signal-Oriented Investigation of Ethical Dissonance Applied to Unsuccessful Mobility Experiences Linked to Human–Machine Interactions. Science and Engineering Ethics, 27(1): 1-25. https://doi.org/10.1007/s11948-021-00284-y

    Article  Google Scholar 

  • Wang J, Yang H, Shao R, Abdullah S, Sundar SS (2020) Alexa as coach: Leveraging smart speakers to build social agents that reduce public speaking anxiety. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-13). ACM, New York, NY, USA. https://doi.org/10.1145/3313831.3376561

  • Yanyan S, Shiyan Li, Xiantao C (2019) Emotional voice interaction design: human computer interaction research map and design case of baidu AI user experience department. Decoration 11:22–27. https://doi.org/10.16272/j.cnki.cn11-1392/j.2019.11.008

    Article  Google Scholar 

Download references

Funding

This study was supported by the National Natural Science Foundation of China (NSFC, 72171015 and 72021001) and the Fundamental Research Funds for the Central Universities (YWF-21-BJ-J-314).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ronggang Zhou or Zhe Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Q., Zhou, R., Zhang, C. et al. Rationally or emotionally: how should voice user interfaces reply to users of different genders considering user experience?. Cogn Tech Work 24, 233–246 (2022). https://doi.org/10.1007/s10111-021-00687-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10111-021-00687-8

Keywords

Navigation