research-article

Open access

Enhancing EmoBot: An In-Depth Analysis of User Expectation and Satisfaction in an Emotion-Aware Chatbot

Authors:

Taseen Mubassira,

Mehedi Hasan,

Jannatun Noor,

A. B. M. Alim Al IslamAuthors Info & Claims

NSysS '24: Proceedings of the 11th International Conference on Networking, Systems, and Security

Pages 65 - 71

https://doi.org/10.1145/3704522.3704527

Published: 03 January 2025 Publication History

PDF eReader

Abstract

The research community has traditionally concentrated on emotion detection in emotion modeling, while emotion generation has garnered less focus. With the rise of artificial intelligence, numerous chatbots have been developed, but there remains a lack of standardized methods to evaluate them. In this study, we evaluate EmoBot, a state-of-the-art emotionally aware chatbot, through the lens of user expectations. By collecting and analyzing user feedback, we identify EmoBot’s strengths and weaknesses, providing a basis for future enhancements. We propose targeted improvements to enhance its user experience, especially in areas of underperformance, and offer a framework for assessing emotionally aware chatbots based on user expectations and experiences.

1 Introduction

In recent years, the rapid advancement of artificial intelligence has propelled social chatbots to the forefront of research and development [13]. Unlike their earlier counterparts, which relied on static, rule-based systems, modern chatbots utilize sophisticated deep learning algorithms [2]. This evolution has significantly improved their conversational capabilities, allowing them to engage in more fluid and natural interactions. However, for chatbots to truly resonate with users and deliver meaningful experiences, it is crucial to understand and align with user expectations and needs [33]. This alignment is often difficult to achieve because user expectations vary widely depending on the context, creating a gap between what users expect and what they experience when interacting with chatbots [31].

Several studies have been conducted to explore what users seek in their interactions with chatbots. One such study revealed that, while users currently enjoy the entertainment aspect of chatbots, there is growing demand for chatbots that exhibit more human-like qualities and emotional intelligence [28]. This shift in user expectations has prompted researchers to move beyond simple emotion detection and focus on how chatbots can respond with appropriate emotional intelligence [29]. Despite these advancements, creating emotionally intelligent chatbots comes with its own set of challenges [9]. One of the most significant hurdles is accurately detecting emotions [24, 26]. While sentiment analysis focuses on identifying the overall tone or attitude in a conversation, emotion detection aims to pinpoint the user’s exact emotional state, which is a far more complex task [20]. For a chatbot to be emotionally intelligent, it must excel in both sentiment analysis and emotion detection [7].

Furthermore, although chatbots are now widely implemented across various platforms, there is still no universally accepted standard for evaluating their performance [8]. Some evaluation metrics have been proposed [22], but human judgment remains an essential component in assessing the quality of chatbot interactions [32]. This lack of standardization highlights the need for more comprehensive evaluation frameworks that can assess both technical and emotional aspects of chatbot performance. In light of these challenges, our study explores EmoBot, an advanced emotional chatbot designed using cognitive appraisal theory [10, 11]. EmoBot generates contextually appropriate emotional responses based on user input, aiming to close the gap between user expectations and their experience with the chatbot. Through this study, we aim to answer three key research questions, focusing on the chatbot’s emotional responsiveness, its ability to meet user expectations, and potential areas for improvement.

•

RQ 1: What specific expectations do users hold when interacting with an emotion-generating chatbot?

•

RQ 2: How does a chatbot based on cognitive appraisal theory measure up to these user expectations?

•

RQ 3: To what degree can we enhance and improve the chatbot using user feedback and empirical data?

Through an analysis of EmoBot’s strengths and weaknesses, we aim to provide valuable insights for enhancing its emotional intelligence and overall user experience. Our research makes the following contributions to the field of Human-Computer Interaction (HCI):

•

We conduct a pre-interaction survey to assess user expectations for an emotion-generating chatbot, offering insights into user perceptions and desires that are critical for HCI design.

•

By evaluating post-interaction feedback, we identify the gaps between user expectations and actual experiences, shedding light on areas where the chatbot’s emotional responses succeed or fall short.

•

Based on these findings, we suggest specific adjustments to the chatbot’s emotional response mechanisms, aimed at improving the alignment between its behavior and user expectations, ultimately enhancing the HCI experience.

2 Background and Related Work

In this section, we explore the landscape of emotionally aware chatbots, examining their development and features. Additionally, we review how these chatbots have been assessed in different studies, highlighting the various evaluation methods and metrics used to gauge their effectiveness and emotional responsiveness.

2.1 Emotionally Aware ChatBots and EmoBot

In 2019, Asma Ghandeharioun et al. created EMMA, an emotionally aware mHealth agent that provided empathetic, emotionally appropriate micro-activities during a two-week study with 39 participants [14]. In 2022, Rhio Sutoyo et al. presented a model designed to interpret emotions in Indonesian customer product reviews, utilizing a feedforward neural network. The effectiveness of this model was evaluated through experiments that focused on hyperparameter tuning and transfer learning [27]. A similar methodology was employed by Achini Adikari et al. in 2019, who developed a model for extracting emotions from conversations and monitoring emotional transitions over time. This model utilized Markov Chains, word embedding, and Natural Language Processing, validated with data from real-world end-users [1].

In 2022, Ekaterina Svikhnushina et al. introduced the PEACE model (Politeness, Entertainment, Attentive Curiosity, Empathy), which identifies essential social qualities of chatbots based on findings from large-scale surveys and structural equation modeling [30]. Most recently, in January 2024, Md Ehtesham-Ul-Haque et al. launched EmoBot, an emotional chatbot capable of assessing emotion-generating events and recognizing six primary emotions through continuous audio and text inputs. EmoBot computes information variables to evaluate situations and generate contextually relevant emotional responses [10].

2.2 Evaluation of ChatBots

Despite the wide range of chatbots currently available, there remains a significant lack of standardization in their evaluation. This issue is further complicated by the diverse contexts and fields in which these chatbots are deployed [18], resulting in varying evaluation criteria for different types. In 2018, Dijana Peras identified several categories for an evaluation framework, including Usability, Performance, Affect, Satisfaction, Accuracy, Accessibility, Efficiency, Quality, Quantity, Relation, Manner, Grammatical Accuracy, Humanity, Business Value [22]. Peras’s study also reviewed the metrics used to gauge chatbot success. In 2019, Sedoc et al. developed a unified framework for the human evaluation of chatbots, which enhanced existing tools and established a web-based platform for comparing interfaces with baseline models and previous research [25].

Additionally, Ren, Ranci, and Castro conducted a systematic mapping study in 2019 to categorize various evaluation techniques documented in the chatbot literature [23]. They found that the field of chatbot usability is still nascent, with most studies consisting of surveys or informal experimental research. Nonetheless, these studies are important as they offer valuable insights for developing guidelines that prioritize usability in design.

Given these discussions, it is evident that no research currently standardizes methods for evaluating emotionally aware chatbots. Therefore, in this study, we will first gather user expectations, then evaluate the state-of-the-art EmoBot, and finally propose enhancements based on user feedback.

3 Methodology

In Bangladesh, a considerable number of individuals have either engaged directly with chatbots or are at least aware of their existence [4]. Numerous businesses, both large and small, along with online retailers, employ chatbots for customer interactions [3]. While the majority of these chatbots are not custom-built and are offered by platforms like Facebook Marketplace, there are also some instances of custom-developed solutions [21]. To gain a comprehensive understanding of user expectations regarding chatbots, it is crucial to conduct a qualitative analysis of EmoBot’s usability as well.

The study’s methodology is designed to begin with participant sampling, followed by data collection from the selected sample and culminating in both qualitative and quantitative analyses. Figure 1 illustrates a flowchart that outlines the structure of our research. With the research questions already defined, this section will concentrate on discussing several additional topics.

Figure 1:

3.1 Participants

The study involved 16 participants in total, of them 9 were females and 7 were males. This number was determined based on the study’s analytical approach and objectives, aligning with the saturation level [19] deemed adequate for the research. Participants were selected through snowball sampling [6]. During selection, we aimed to make a balance between participants with prior chatbot experience and those without, to avoid introducing bias [12] towards either familiarity with chatbots or unfamiliarity with emerging technologies. Of the participants, 10 were young adults aged 20-30, 4 were between 30-40, and 2 were aged 40-50. All participants were from Bangladesh and fluent in English. Each participant reported regular use of a computer or smartphone in their daily life. 62% had prior exposure to some form of chatbot, while the remaining 40% either had never heard of chatbots or had never used one. Additionally, 12% of the participants reported using intelligent assistants such as Google Assistant.

Table 1:

Gender	Male: 7
	Female: 9
Age	20-30 years: 10
	31-40 years: 4
	41-50 years: 2
Ever Used Chatbots	Yes: 10
	No: 6
Used Intelligent Assistants	Yes: 2
	No: 14

Table 1: Demography of the participants

3.2 Data Collection

We employed a semi-structured interview approach to explore user expectations and concerns. Initially, we asked basic questions to determine if users had prior experience with chatbots, what general expectations they held regarding chatbots, and whether they believed chatbots provide any value. Next, we presented a detailed document that explained EmoBot’s capabilities in Layman’s terms. Following this, we prepared a questionnaire with specific questions about user expectations from EmoBot, focusing on aspects like usability, satisfaction, accuracy and humanity, along with emotional responses such as fear, joy, sadness, surprise and anger. We collected the following metrics from participants:

•

Relevance: Assessed how relevant the chatbot’s response was to the context of the prompt.

•

Satisfaction: Measured the level of satisfaction the user felt with the chatbot’s response.

•

Accuracy: Evaluated whether the chatbot’s reply conveyed the correct emotion in line with the prompt. Accuracy was also measured for each specific emotion separately.

•

Appropriateness: Assessed the human-like quality of the chatbot’s response.

Next steps of the data collection procedure was structured around pre-interaction and post-interaction phases with EmoBot.

3.2.1 Pre-Interaction Interview:

We first conducted a pre-interaction interview with users to collect quantitative and qualitative data regarding their expectations of the chatbot. Participants were asked to rate on each of the metrics on a scale from 0 to 10 of what they anticipated in terms of the chatbot’s emotional capabilities, conversational fluidity, and overall user experience. This initial data provided a baseline understanding of user expectations, which we later compared against their post-interaction feedback.

3.2.2 Post-Interaction Interview:

After users interacted with EmoBot, we conducted a post-interaction interview to assess their experiences. This phase collected quantitative data in the form of user ratings on the same metrics as the pre-interaction interview. Users were asked to rate their experience with EmoBot on a scale from 0 to 10 on each of the metrics for assessing its performance. We also collected descriptive feedback on how users’ experiences aligned with their initial expectations before interacting with the chatbot. In instances where a user gave an unusually low rating on any particular metric, we followed up by asking them to explain the reasons behind their rating to get an understanding of how the chatbot failed to meet their expectation on that metric.

Interaction of users with EmoBot were made through both audio and text-based formats. The average duration of the audio chat was two minutes, while the textual interactions had an average duration of six minutes. Participants were given guidance on how to prompt the chatbot to elicit responses corresponding to the five emotions: fear, joy, sadness, surprise and anger.

3.3 Data Analysis

Based on the user ratings, we conducted a thorough quantitative analysis of the data. For each metric, we first calculated the mean rating for both the expectation values (collected before the interaction) and the experience values (collected after the interaction). By determining the difference between the mean expectation ratings and the mean experience ratings, we were able to assess how well the chatbot met user expectations across various metrics. This comparison allowed us to visualize the chatbot’s performance in relation to each specific metric.

Additionally, we calculated the average accuracy ratings provided by users for each of the five emotions EmoBot aimed to convey. This analysis helped us evaluate the chatbot’s effectiveness in generating accurate emotional responses across the emotional spectrum.

Beyond the quantitative analysis, we also performed a descriptive analysis of the qualitative feedback from users. This included examining comments on their experiences, reasons for unusually low ratings, and general insights about the chatbot’s emotional capabilities. The combination of these analyses provided a clearer understanding of EmoBot’s strengths and areas needing improvement, which will be discussed in detail in the following section.

3.4 Ethical Concerns

Before involving human volunteers to collect any data, the study obtained approval from the Institutional Review Board (IRB) of Brac University, ensuring adherence to ethical standards. This approval process safeguards the rights, privacy, and safety of all participants throughout the research. As a result, the study’s findings are strengthened in terms of credibility and reliability, having undergone thorough ethical validation.

Ethical considerations were prioritized throughout the interview process. Participation was explicitly stated as being voluntary, and individuals’ privacy was carefully protected. Following protocols established after obtaining informed consent, participants’ identities and responses were kept confidential and anonymous. By following these procedures and securing the necessary institutional review board approvals, the study maintained strict compliance with ethical guidelines.

Figure 2:

Figure 3:

Table 2:

	Relevance	Satisfaction	Accuracy	Appropriateness
Expectation	8.3	6.4	8.9	8.7
Experience	4.1	3.8	6.8	5.4

Table 2: Average rating of participants against each metric

4 Findings

From Table 2, we observe that users primarily expected EmoBot to be accurate in generating emotions. Accuracy received an average expectation rating of 8.9, which was the highest among all the 4 metrics. This is likely due to EmoBot being introduced as an emotionally aware chatbot, leading users to naturally expect it to generate accurate emotional responses. Among the various metrics measured, EmoBot performed best in terms of Accuracy, receiving a rating of 6.8 out of 10—the highest among all metrics.

However, user expectations for Satisfaction were significantly lower compared to other metrics, it only received a rating of 6.4, where the average of all ratings were 8.07. When asked about their low expectations, many users attributed it to past experiences with chatbots, which had left them dissatisfied, thereby lowering their satisfaction threshold. Similarly, EmoBot’s actual Satisfaction score was also quite low, receiving just 3.8 out of 10, the lowest rating across all the metrics.

Another interesting observation from Figure 2 is that the metric with the highest discrepancy between user expectations and actual experience was Relevance. This suggests that while EmoBot excelled at generating accurate emotions, its responses were not always contextually relevant to the prompts. This lack of contextual relevance likely contributed to the low Satisfaction score, as users are unlikely to be fully satisfied if the responses they receive are not appropriately aligned with the conversation.

Moreover, Figure 3 highlights that EmoBot struggled significantly with generating the emotion Surprise, while it was most accurate when generating Joy. Upon examining the underlying emotion generation model of the chatbot, we found that it relied on mapping informative variables. This suggests that the model could potentially be fine-tuned by adjusting these mappings to improve the generation of Surprise and create a better balance across the five primary emotions.

From the results that we have, we will now answer each of the research questions that we presented at the start of this literature.

4.1 User Expectations from An Emotion-Generating Chatbot (RQ1)

One of the primary research goals of this study was to understand user expectations when interacting with an emotion-generating chatbot. The findings revealed that accuracy was the most expected feature. Users expected the chatbot to respond with emotions that accurately aligned with the conversation. This emphasis on accuracy is logical, given that the chatbot’s core feature is emotion generation, leading users to prioritize emotional precision. Following accuracy, the second most valued expectation was appropriateness, with users desiring the chatbot’s responses to be as human-like as possible. This expectation aligns with the nature of an emotion-generating chatbot, as users naturally want the emotional responses to be conveyed in a realistic, human manner.

Interestingly, the study also uncovered an unexpected result: users had relatively low expectations for satisfaction, with only about 60% of the time participants expecting to be satisfied with the chatbot’s responses. This contrasts with the high expectations for accuracy and appropriateness. Many users attributed this skepticism to prior negative experiences with chatbots. Despite high expectations for performance, users remained doubtful that their interactions would ultimately meet their satisfaction. Two participants mentioned -

I don’t think just proper emotion can satisfy me. It will have to be as good as modern AI like ChatGPT for that. - P1
It is very difficult for me to chat with online shop chatbots. - P2

The opinions expressed by Participant1 and Participant2 indicate that their low expectations originate from prior experiences with chatbots. Participant1, for instance, has a higher standard for satisfaction due to previous interactions with modern, LLM-based chatbots. On the other hand, Participant2 is more skeptical because of past negative experiences with similar chatbots.

4.2 Chatbot Based on Cognitive Appraisal Theory against User Expectations (RQ2)

The second goal of this study was to assess how well a chatbot based on the cognitive appraisal theory of emotions performs, given user expectations. According to user ratings in Table 2, EmoBot was most successful in producing responses with accurate emotions, which is a positive outcome for an emotion-generating chatbot. However, when examining other metrics, the chatbot’s overall performance was less impressive from the users’ perspective. User satisfaction received the lowest score. While accurately generating emotions is a key success factor for EmoBot, it is equally important for it to achieve higher satisfaction ratings. Ultimately, if users are not satisfied with the responses, it indicates there are room for improvements in the chatbot’s design and functionality.

4.3 Enhancements Based on User Feedback (RQ3)

In our study, we observed that EmoBot struggles to effectively generate the emotion of surprise. EmoBot’s emotional responses are based on cognitive appraisal theory, using specific appraisal variables to evaluate events. The resulting emotion is produced through calculations that map these appraisal variables to five primary emotions. Among the various techniques discussed in EmoBot’s development, the authors’ mapping approach was selected.

The values which map appraisal variables to emotions can be adjusted to better reflect different emotions or appraisal variables. Such adjustments would directly impact how EmoBot generates specific emotions. Based on user feedback in Figure 3 in our study, it is clear that EmoBot is not equally proficient in generating all five emotions. Therefore, we conclude that minimal modifications to EmoBot’s model could fine-tune its emotion generation process and improve its performance.

5 Discussion

In this section, we discuss how our study contributes to the state-of-the-art emotionally aware chatbot and how this study paves the way for the next improvements on the chatbot.

5.1 Advancing HCI: Evaluating EmoBot Through User Expectations

Our study significantly advances the HCI literature by focusing on EmoBot, an emotionally aware chatbot that sets itself apart as a state-of-the-art model. While previous research has evaluated various chatbots [10], including those lacking emotional awareness, none have specifically examined EmoBot from the perspective of user expectations. Understanding user needs and expectations is fundamental to the development of any technology, as user satisfaction can greatly influence the overall success of a system, even when other factors such as budget and timeline constraints are not met [5]. Ultimately, a model’s effectiveness hinges on its ability to fulfill user requirements [15].

In our study, we gathered and analyzed data reflecting user expectations to pinpoint both the strengths and weaknesses of EmoBot. By identifying areas where EmoBot excels and where enhancements are needed, we contribute valuable insights to the HCI field. Furthermore, our detailed examination of specific emotional criteria reveals the emotions with which the chatbot faces the most challenges. This targeted analysis not only informs the immediate improvements necessary to boost EmoBot’s performance but also lays the groundwork for future research in emotionally aware technology.

By integrating user feedback into our evaluation, we align our study with HCI’s commitment to user-centered design, emphasizing the importance of incorporating user experiences in the development of intelligent systems. Our findings can inspire future research efforts to refine chatbot technologies and establish standardized evaluation frameworks, thereby enriching the body of knowledge in HCI and contributing to the design of more effective and emotionally responsive systems.

5.2 Scoping Out The Areas for Improvement

In this study, we identify two critical areas for enhancing EmoBot’s performance: refining the mapping of appraisal variables to achieve a more balanced emotion generation and integrating contextual understanding into the chatbot’s framework. From an HCI perspective, achieving a more consistent balance in emotional responses is essential for improving user-friendliness. Currently, as depicted in Figure 3, the disparities between various emotions are too pronounced. This imbalance can lead to confusion or frustration for users, negatively impacting their overall experience. By addressing these discrepancies and enhancing the chatbot’s emotional range, we can significantly improve the perceived accuracy of EmoBot’s responses. This would foster a more engaging interaction, allowing users to feel that the chatbot understands and responds appropriately to their emotional states.

Moreover, increasing user satisfaction remains a primary challenge for EmoBot. To tackle this, we advocate for the incorporation of contextual understanding within the chatbot. Context-aware chatbots can provide responses that are not only relevant to the immediate conversation but also sensitive to the user’s past interactions and emotional states. Research indicates that chatbots leveraging context significantly enhance user satisfaction, particularly in light of advancements in large language models, which can analyze and retain contextual information over multiple exchanges [16]. However, integrating context into chatbots is fraught with challenges, such as ensuring data privacy and maintaining computational efficiency [17].

Addressing these challenges and implementing contextual awareness represents a promising area for further exploration and development in the HCI domain. By focusing on these improvements, EmoBot can evolve into a more sophisticated and emotionally intelligent chatbot, ultimately enhancing user interaction and satisfaction.

6 Conclusion, Limitations and Future Work

In this study, we analyzed EmoBot, a cutting-edge emotionally aware chatbot, to identify the necessary changes for enhancing its user-friendliness. Our analysis focused on user perspectives, assessing various metrics to evaluate EmoBot’s performance in relation to user expectations. We also categorized our findings across the five primary emotions that EmoBot can generate. This analysis serves as a foundation for future improvements to the chatbot.

6.1 Limitations

One significant limitation of this study is the absence of a clearly defined context for evaluating user interactions with EmoBot. The chatbot’s use cases and target audience remain unclear, resulting in a diverse selection of participants from various occupations and age groups. As a result, the metrics derived from this varied sample provide a generalized assessment. Additionally, the relatively small sample size highlights the need for a larger and more focused participant pool, which could yield more specific insights, particularly if a target audience is identified.

Furthermore, the study did not investigate the long-term effects of user interactions with EmoBot, leaving unresolved questions about how prolonged usage may influence user satisfaction and emotion recognition. The current evaluation relied on short-term user feedback, which may not accurately reflect the chatbot’s performance during extended interactions.

6.2 Future Work

In future studies, we aim to expand user acceptance testing by engaging diverse audience groups, including individuals from varying age ranges, cultural backgrounds, and professional fields. This broader approach will enable us to explore how different demographics interact with emotionally aware chatbots like EmoBot, yielding more comprehensive insights into its effectiveness and appeal across various user profiles. A primary objective will be to determine which specific groups—defined by age, profession, or emotional needs—are most likely to benefit from EmoBot.

Once we identify the ideal target audience, subsequent work will focus on fine-tuning EmoBot to address the specific needs of that audience. Tailoring EmoBot’s emotional responses based on the unique characteristics of different user groups could significantly enhance user satisfaction and engagement.

References

[1]

Achini Adikari, Daswin De Silva, Damminda Alahakoon, and Xinghuo Yu. 2019. A cognitive model for emotion awareness in industrial chatbots. In 2019 IEEE 17th international conference on industrial informatics (INDIN), Vol. 1. IEEE, 183–186.

Abstract

1 Introduction

2 Background and Related Work

2.1 Emotionally Aware ChatBots and EmoBot

2.2 Evaluation of ChatBots

3 Methodology

3.1 Participants

3.2 Data Collection

3.2.1 Pre-Interaction Interview:

3.2.2 Post-Interaction Interview:

3.3 Data Analysis

3.4 Ethical Concerns

4 Findings

4.1 User Expectations from An Emotion-Generating Chatbot (RQ1)

4.2 Chatbot Based on Cognitive Appraisal Theory against User Expectations (RQ2)

4.3 Enhancements Based on User Feedback (RQ3)

5 Discussion

5.1 Advancing HCI: Evaluating EmoBot Through User Expectations

5.2 Scoping Out The Areas for Improvement

6 Conclusion, Limitations and Future Work

6.1 Limitations

6.2 Future Work

References

Index Terms

Recommendations

Is chatting with a sophisticated chatbot as good as chatting online or FTF with a stranger?

Interactive double states emotion cell model for textual dialogue emotion prediction

Comparing users' perception of different chatbot interaction paradigms: a case study

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations