Abstract
There is a growing tendency that users expect voice assistants can recognize and follow some principles in interpersonal communication to enhance emotional experience. Therefore, we study how personal pronouns should be used in the response of voice assistants for Chinese users. We conducted a quantitative experiment. The independent variable is the use of personal pronouns in the intelligent voice assistants, including three levels: no personal pronouns, singular first-person pronouns and singular second person pronouns. 24 participants listened to dialogues between users and the voice assistant and evaluated the perception of emotional experience. It is found in the results that the use of personal pronouns by voice assistants can affect users’ emotional experience. Compared with non-use of personal pronouns and use of first-person pronouns, users would have more trust in the voice assistants when it responded in the second person pronouns, and users were more satisfied with the response of the voice assistants in the second person pronouns. These results can inform the design of voice interaction and it is possible to design response strategies for machines based on the theory of interpersonal communication and pragmatics.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In this developing country, China, ever since the launch of the first batch of digital home assistants of Chinese brands, such as Tmall Genie and Du digital home assistant in 2017, the digital home assistant market in China has risen rapidly. According to statistics, the sales volume of digital home assistants in China will exceed 60 million by the end of 2019 [1]. At the same time, Huawei, Xiaomi and other mobile phone brands have also launched voice assistants. Popular social networking platforms TikTok and Weibo have witnessed very high click-through rates (for example: # Xiaodu helps me turn on the air conditioner #) on funny videos related to voice system, which demonstrates that voice interaction has been gradually integrated into people’s daily life. However, through the usability test and interview earlier in this article (n = 100), we found that, in addition to other uses, people will often interact with the digital home assistant only when they just buy it, and a few days later users merely use it to listen to the music, check the weather, etc., and there is few continuous in-depth interaction.
Scholars in the field of human-computer interaction has always been focusing on the human-to-technology communication [2,3,4]. Cassell (2001) proposed to concentrate on social and linguistic intelligence “conversational smarts” [5], so as to make people more willing to use intelligent voice system. In current researches, the commonly used research method is to observe the user’s use behavior in various ways, conduct conversational analysis, and then do the interview so as to provide the designer with fresh and constructive perspectives. The research questions can be divided into two categories. One is to analyze how users respond to different situations in human-computer interaction, such as error reporting scenarios [6, 7], interruption scenarios [2], and multi-people participation scenarios [4]. The other is to analyze the differences of different types of users’ usage patterns and emotional experiences in voice interaction, such as computer level [3, 8] and user experience [8].
At the same time, it is found that people would use social language when interacting with powerful and idealized conversational interfaces [9]. This means that although people will constantly adjust their language to adapt to the interaction with the machine, it does not mean that they do not have higher expectations for the intelligence degree of the machine’s speech expression [3]. For humans, interpersonal function is one of the important functions of linguistics [10]. Different linguistic features (content features, physical attributes) represent different emotional expressions [11]. Therefore, we want to know how to use the linguistic features to improve the user’s experience in human-computer interaction.
In our research, we blend two fields of study: human computer interaction and pragmatics. We choose the personal pronouns in pragmatics to study the following problems. Firstly, this paper reviews the literature on pragmatics and personal pronoun use. And then we report an experiment and analyze the impact of pronouns on user experience through quantitative methods.
2 Personal Pronouns
The use of personal pronouns can affect the emotion conveyed by a sentence. A sentence is made up of two words, content word and functional word. Functional words are responsible for connecting and organizing content words, such as pronouns, prepositions and conjunctions [12]. Personal pronoun is an important functional word, including the first-person pronoun, the second-person pronoun and the third-person pronoun and it denotes a reference [13]. Subtle changes in personal pronouns can bring a lot of emotional experience, which can be used to analyze a person’s personality. For example, the use of pronouns in writing is related to psychological state [14]. By observing how people use functional words such as personal pronouns, we can learn what they are thinking and how they relate to others [12]. This means that the use of pronouns affects how people relate to others. These conclusions are not only found in the field of education, but also confirmed by other studies. In the field of marketing, people use smart personal pronouns to repair the relationship with lost customers [15] and improve the reliability of online reviews [16]. In a marital relationship, the pronouns a spouse uses in conflict resolution discussions provide insight into the quality of their interactions and marriage [17]. The use of personal pronouns plays an important role in education, marketing and educational psychology. What these areas have in common is that they are all about improving human relationships. This is exactly what is needed in the field of human interaction today. This paper mainly analyzes the singular of the first-person pronoun and the singular of the second-person pronoun so it is about the one-to-one conversation between the voice system and the user.
The same sentence can be expressed in either the first or second-person pronoun. In a one-on-one conversation, the speaker can express different emotions from the orientation of the other person or from himself. The subtle differences between the two are reflected in the flexible use of pronouns in language expression. The same answer, if it starts with “you”, which means the second-person pronoun is used as the subject, it refers to other-focused, whereas the first-person pronoun, “I”, means it starts from the speaker [18]. For example, in a marital relationship, spouses who use more second-person pronouns are more negative in the interaction, while spouses who use more first-person plural pronouns will propose more positive solutions to problems [17]. For example, when the user asks voice assistants guide about dress, voice assistants answer “down jacket” which is to provide information directly, centering all information on clothing guideline. Yet “I think it is ok to wear a down jacket” is one of the suggestions from the voice assistant perspective. The sentence “you can wear a down jacket” is not a straight answer but a step forward from voice assistant as the main body to the user as the main body, providing the user with an action command.
First-person pronouns are biased and subjective compared to not using them. Users will find it less efficient, and therefore less helpful; secondly, the perception of the relevance of comments will be reduced [16], which will be regarded as the speaker wants to express his own views rather than provide the information the listener really wants. For example, “Beijing cuisine includes” will be more objective than “I think Beijing cuisine includes”. Past studies of human-computer interaction have found that because voice assistants are non-human, “I’’ is a human word, so if the voice assistant uses “I” to answer, it will cause the user’s cognitive dissonance [11].
The second personal pronoun can form a self-reference [13] to enhance persuasion. Adding a second person pronoun to the description can enhance people’s self-reference. The central aspect of self-reference is that the self-acts as a background or setting against which incoming data are interpreted or coded. This process involves an interaction between the previous experience of the individual (in the form of the abstract structure of self) and the incoming materials [19]. Self-reference can prompt people to wake up the previous experience, and then it is easier to produce purchase behavior, which can increase persuasion. When consumers receive information that is relevant to them, they will respond more positively to the message. This effect is more pronounced in declarative situations [20] and voice assistants respond in declarative sentence. Thus, for voice assistants, it might be more convincing to use second-person pronouns.
In our research, we blend two fields of study: human computer interaction and pragmatic and try to answer the following problems. (1) Should intelligent voice assistants use personal pronouns? (2) Which personal pronouns should they use? (3) How can the use of personal pronouns improve the user experience? What has affected users’ emotional experience?
3 Method
We conducted a study about the use of personal pronoun in voice assistants on the perception of emotional experience. We use a qualitative method to carry out the experiment. Participants are asked to rate the conversations they hear between the user and the voice system (more on that later).
The independent variable is the response mode of the voice system, and there are three levels of them, first person pronoun singular, second person pronoun singular, no personal pronoun. It should be noted that there are two forms (word-forms) in Chinese regarding the singular of the second personal pronoun in the independent variable, the commonly used one is “ni” and the other is “nin”. “nin” represents a more respectful “you” in Chinese culture.
In the experimental design, given that voice assistants are now primarily positioned to serve users, the second personal pronoun singular level in the independent variable is “nin” rather than “ni.”
3.1 Experimental Material and Stimulus Preparation
The material in this experiment is pre-recorded voice conversations, that is, one to one one-round answer that simulates the user asking and then the voice assistant answering. Interactive scenarios of weather query and travel query were selected as experimental scenarios in this study. Each scenario involved a script with three dialogues at different levels of independent variables. See the Table 1 below for details.
These dialogues were finally determined through predictive tests and internal discussions. The Baidu AI open platform was used to make voice dialogues between users and voice assistants in advance. The speech speed is medium, about 280 words/min. Synthetic sounds are used in this study. Participants rated the results after listening to the recorded conversation.
In addition to the independent variables, we also control for other variables. First, personal pronouns are located in the subject part of the sentence, i.e. the first part of the sentence, and no personal pronouns are found in other parts of the sentence. Second, in actual use, users may also use different forms of personal pronouns to interact with the voice system, so this study sets that users do not use personal pronouns in the dialogue.
3.2 Design of Affective Experience Satisfaction Scale
Which aspect of user experience is affected by the use of personal pronouns? Through the literature review, we found that there are some scales in the field of human-computer interaction and user experience, such as the attrakdiff scale in user experience. In human-computer interaction, there are mood scales for PAD and SUS scales for synthetic sounds. However, the objective of this paper is to explore the experience of pronoun use, which involves linguistics. Therefore, some indicators in interpersonal communication are considered to be introduced. Therefore, this study did not use existing questionnaires, but designed several questions to understand the emotional experience of users.
In this paper, 4 user experience researchers and 1 user experience professor were invited to discuss and determine the measurement of emotional experience. After discussing the five questions designed in this paper, the subjective experience of the subjects was measured. Firstly, the naturalness index. Because the experimental material in this paper adopts the synthetic audio frequency [21]. For the synthetic sound, naturalness is an important measurement index in speech interaction. Secondly, considering that the design of personal pronouns itself comes from pragmatics, this paper selects two indicators of closeness index and trust index in speech research based on theories related to interpersonal speech [22]. Also, the index of trust plays an important role in human-computer interaction [24]. Thirdly, the design of personal pronouns focuses on enjoyment rather than practicality in user experience. Therefore, this paper also adds the fondness index [23], and finally adds the satisfaction index as the overall evaluation of users’ emotional experience [23].
The dependent variable is user’s real feeling towards the voice system response, which is measured from the following five dimensions: “the response of voice response is pleasant”, “the response of voice system is natural”, “the response of voice system is trustworthy”, “the response of the voice system can close the distance between me and the voice system” and “the response of the voice system is satisfying”. Participants rated all cues by completing a 7-point Likert scale, from totally disagree to totally agree.
3.3 Experimental Procedure: Rating
The experiment was conducted in a quiet computer room, with a pair of headphones placed in front of each computer for participants to use. The specific process is as follows: participants first enter the E-prime system, read the experimental instructions presented on the screen independently, and then enter the rehearsal stage of the formal experiment (the rehearsal part can be done repeatedly). After comprehending the experiment process, combining with their real feelings, the participants will be asked to rate the conversations between the user and the voice system. The experimental instruction is as follows:
“Next, you will do a voice evaluation experiment. After entering the experiment, you will hear a dialogue between the user and the voice system. The male voice is for the user and the female voice is for the voice system. Please pay attention to the content difference of every conversation. At the end of the conversation, you will need to evaluate this dialogue in multiple dimensions and record the score according to the instructions. The system will automatically jump to the next page when you finish recording.”
Selection of dependent variables: after listening to each dialogue, participants evaluated their feelings from six dimensions through a 7-point scale (1 = Strongly disagree, 7 = Strongly agree).
Task order: participants evaluate all three voice dialogues in the same scene, and then evaluate the voice dialogues in the other scene. Two dialogue scenes are randomly presented, and three audio dialogue materials from the same scene are randomly presented. Participants will grade after listening to each dialogue.
The experiment has passed the ethical review of the institution. Participants were required to sign the informed consent form before the start of the experiment. There were 24 participants, including 11 male subjects, 13 female subjects, 12 working subjects and 12 students. Participants shall at least have used intelligent voice assistant, and the one with intelligent speaker experience is preferred. This was the final sample after 6 participants were excluded from the study (because they were abnormal values).
4 Result
This paper adopts the single factor analysis of variance (one - way ANOVA) to do the analysis. The results find that overall, the participants of the experiment of speech synthesis have given a generally high score to the emotional experience of dialogue (all index score of 5.40 or higher), belonging to the acceptable range. Combined with the data analysis, we can draw the conclusion that on all emotional experience index, use of “nin” to the phonetic system has yielded the highest score, the second is not using personal pronouns and using first-person pronouns (Fig. 1).
In the index of trust, the second-person pronoun singular score (6.13) is significantly higher (F(2,141) = 3.30, p = .040) than the first-person pronoun singular score (5.63). It means that using the second personal pronoun response makes it easier for the voice assistant to build trust with the user; in the emotional experience index of fondness, the mean score of voice assistant using the second personal pronoun singular is 6.00, and the mean score of voice assistant answering with and without the first personal pronoun singular is 5.60 and 5.79, respectively. In the naturalness index, the mean score of the voice assistant using the second personal pronoun singular is 5.92, while the mean score of the voice assistant answering with and without the first personal pronoun singular is 5.44 and 5.60, respectively. In the closeness index, the mean score of the voice assistant using the second personal pronoun singular is 5.85, while the mean score of the voice assistant answering with and without the first personal pronoun singular is 5.46 and 5.48, respectively.
In terms of overall satisfaction, the second personal pronoun singular score (6.15) is significantly higher (F(2,141) = 3.21, p = .043) than the first personal pronoun singular score (5.60). This suggests that users are generally more satisfied with second-person pronoun responses, and the reason for this may be that voice assistant is more likely to build trust when using second-person pronoun responses.
5 Discussion
This paper studies the effect of personal pronouns on users’ emotional experience in voice interaction. The results showed that when personal pronouns appeared in the response of the voice system, the emotional experience of users changed. In general, when the second personal pronoun was added to the assistant’s response, users trusted the response more and were more satisfied with the response. However, not all personal pronouns could help improve users’ emotional experience. Compared with no personal pronouns, the use of second personal pronouns can indeed improve users’ emotional experience, while the addition of first personal pronouns can have adverse impact on users’ emotional experience. This was also analyzed in the previous literature review. The same sentence can begin with either a first-person pronoun or a second-person pronoun, the difference being whether from the speaker’s or the listener’s perspective. For the response starting with the first-person pronoun, for one thing, it can be seen from the research results in other fields that the information contains the subjective view of the speaker, which will reduce the objectivity of the information and distract the user from understanding the information. Previous human-computer interaction, for another, the study found that users believe that machines can communicate but not human beings; to express and to begin with “I”, it is considered that the speaker needs to have their own independent personality, and this is contradictory to the identity of the machine, so when the machine using “I”, will allow the user to produce psychological reaction of cognitive dissonance. As for the second personal pronoun, from the perspective of pragmatics and psychology, starting with the second personal pronoun in the presentation of information will trigger the listener’s self-reference behavior, stimulate the user’s memory related to the past, and thus improve the persuasiveness of the information.
This paper also further verifies the reason why the second person can improve users’ emotional experience and satisfaction. In this study, users were asked to evaluate the responses of voice assistants from five aspects: naturalness, fondness, trust, closeness, and satisfaction, among which the satisfaction index was put at the end as the overall evaluation. Finally, it was found that for the responses starting with the second-person pronouns, the first-person pronouns and with no personal pronouns, there were no significant differences in naturalness, fondness and closeness; the users’ evaluation of differences embodied in trust and satisfaction. It further illustrates that the user satisfaction is higher because users have more trust in responses with the second-person pronoun. Trust in automated systems has been extensively studied. Trust is one example of the important influence of affect and emotions on human-technology interaction. Trust is also one example of how affect-related considerations should influence design of complex, high consequence systems [24]. This paper further verified the important role of trust in human-computer interaction design. For the future human-computer interaction design, this paper suggests that the conclusion of pragmatics can be considered to improve user experience, and user trust should be taken into account when improving user experience.
6 Limitation
The experimental tasks selected in our quantitative experiment are from 2 scenarios. In order to control the variables, the queries and responses between 2 scenarios were consistent, which made the content of some tasks more verbose than that of daily life. The sample this time is limited to 24, and only in China. The scope and scale of the sample will be expanded in the future.
References
Hannah, RM., Pelikan, Broth, M.: Why that Nao: How humans adapt to a conventional humanoid robot in taking turns-at-talk. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 4921–4932. ACM (2016)
Beneteau, E., Richards, OK., Zhang, M., Kientz, J.A., Yip J., Hiniker, A.: Communication breakdowns between families and Alexa. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, p. 243. ACM (2019)
Luger, E., Sellen, A.: Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5286–5297. ACM (2016)
Porcheron, M., Fischer, J.E., Sharples, S.: Do animals have accents?: talking with agents in multi-party conversation. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 207–219. ACM (2017)
Cassell, J., Bickmore, T., Campbell, L., Hannes, V., Yan, H.: Conversation as a System Framework: Designing Embodied Conversational Agents. The MIT Press, Cambridge (2000)
Jiang, J., Jeng, W., He, E.: How do users respond to voice input errors? lexical and phonetic query reformulation in voice search. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 143–152. ACM (2013)
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Chen, M., Wang, H.: How personal experience and technical knowledge affect using conversational agents. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion, p. 53. ACM (2018)
Large, D.R., Clark, L., Quandt, A., Burnett, G., Skrypchuk, L.: Steering the conversation: a linguistic exploration of natural language interactions with a digital assistant during simulated driving. Appl. Ergon. 63, 53–61 (2017)
Liu, J., Shu, N., Zhang, Q.: Interpersonal interpretation of personal pronoun in marriage advertising. Res. J. Eng. Lang. Lit. 3(1), 18–25 (2015)
Nass, C.I., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press, Cambridge (2005)
Pennebaker, J.W.: The secret life of pronouns. New Sci. 211(2828), 42–45 (2011)
Levinson, S.: Pragmatics (1983)
Campbell, R.S., Pennebaker, J.W.: The secret life of pronouns: flexibility in writing style and physical health. Psychol. Sci. 14(1), 60–65 (2003)
Packard, G., Moore, S.G., McFerran, B.: (I’m) Happy to help (you): the impact of personal pronoun use in customer-firm interactions. J. Mark. Res. 55(4), 541–555 (2018)
Wang, F., Karimi, S.: This product works well (for me): The impact of first-person singular pronouns on online review helpfulness. J. Bus. Res. 104, 283–294 (2019)
Simmons, R.A., Gordon, P.C., Chambless, D.L.: Pronouns in marital interaction: what do “you” and “I” say about marital health? Psychol. Sci. 16(12), 932–936 (2005)
Ickes, W., Reidhead, S., Patterson, M.: Machiavellianism and self-monitoring: as different as “me” and “you”. Soc. Cogn. 4(1), 58–74 (1986)
Rogers, T.B., Kuiper, N.A., Kirker, W.S.: Self-reference and the encoding of personal information. J. Pers. Soc. Psychol. 35(9), 677 (1977)
Escalas, J.E.: Self-referencing and persuasion: narrative transportation versus analytical elaboration. J. Consum. Res. 33(4), 421–429 (2006)
Adiga, N., Prasanna, S.R.M.: Acoustic features modelling for statistical parametric speech synthesis: a review. IETE Tech. Rev. 36(2), 130–149 (2019)
Street, R.L.: Evaluation of noncontent speech accommodation. Lang. Commun. 2(1), 13–31 (1982)
Brill, T.M., Munoz, L., Miller, R.J.: Siri, Alexa, and other digital assistants: a study of customer satisfaction with artificial intelligence applications. J. Mark. Manag. 35(15–16), 1401–1436 (2019)
Lee, J.D., See, K.A.: Trust in automation: designing for appropriate reliance. Hum. Factors 46(1), 50–80 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Qu, J., Zhou, R., Zou, L., Sun, Y., Zhao, M. (2020). The Effect of Personal Pronouns on Users’ Emotional Experience in Voice Interaction. In: Kurosu, M. (eds) Human-Computer Interaction. Multimodal and Natural Interaction. HCII 2020. Lecture Notes in Computer Science(), vol 12182. Springer, Cham. https://doi.org/10.1007/978-3-030-49062-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-49062-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49061-4
Online ISBN: 978-3-030-49062-1
eBook Packages: Computer ScienceComputer Science (R0)