Keywords

1 Introduction

In the recent years, personal communication via interactive assistants has become mainstream for almost every personal interaction device [1]. Commonly referred to as “chatbots”, generic implementations engage natural language communication to interact with the user and allow the user to interact with the system or the device. Chatbots utilize natural language processing, human-computer interaction, dialogue systems, virtual characters and other related technologies [2, 3].

Depending on their use, chatbot design can be classified as task-oriented and non-task-oriented [4]. The former is close-domain, designed for particular tasks and short goal-oriented conversations. The latter is open-domain, generic conversational agents that are designed to simulate a conversation and provide personal assistance and communication to users, using broad natural language expressions.

Domain-specific chatbots require very high effectiveness and are evaluated on their ability to perform very well on specific functions, such as learning [5, 6]. Domain-independent chatbots, on the other hand, require broad conversational capacity, using natural language processing and rules or AI for complex interaction [7, 8].

The design of the conversational interface for a chatbot takes into account both the domain (context) and the task (goal). This work aims to investigate how the purpose of the design is perceived between designers and user and how semantics can be used in the formulation of the design.

Based on the above, a relevant research question is how designers and users perceive the design of a relatively complex interaction system. The purpose of the system, as well as the semantics behind its intended use are a key characteristic to user acceptance during first-time introduction and interaction. This work examines the intended design perception between designers and users and reports on the semantics that lead to the design formulation and their impact on the design principles.

The rest of the paper is as follows. Section 2 presents the related work. Section 3 describes the experimental setup and the methodology. Section 4 presents the evaluation results, while Sect. 5 concludes the paper and presents the future work.

2 Related Work

Chatbots are deployed in a multitude of settings and are used for several tasks. For tasks such as e-learning [9], the chatbots may be deployed from a central point and accessed from school, home or on a mobile device. Mobile devices are very suitable platforms since they offer dedicated support for visual, spoken and tactile interaction. Particular chatbot implementations can be designed for primary mobile use, such as the ones for recommendation of tourist sites [10]. Other chatbots may be accessed on specific locations, such as an airport [11]. Turunen et al. (2011) designed chatbots for health and fitness companionship [12].

Ciechanowski et al. (2019) showed that there are differences in human perception of a chatbot depending on the type of chatbot, text-based or avatar [13]. They found that users were more reserved and felt less positively towards the avatar chatbot. Their results show that user emotions change depending on the chatbot type. The type of user and the topic of discussion is also important on the acceptance of the chatbot by the human, as an interaction partner. Studies show that specific user types may prefer to talk to chatbots than to other people for specific topics [14].

Social media settings are different than real life settings. The former can facilitate social interaction with similarly acceptable results as with a real life human for information seeking and learning activities [15]. Regarding the tasks that are the focus of the human-chatbot interaction, specific activities, such as language practice/learning, were found to be more interesting if performed by a human rather than a chatbot instructor [16]. This underlines the limitations of the educational technology – and all the technologies mastered for such tasks - as to the expertise, the abilities and knowledge of the educator. The human educator would clearly possess much higher level of command of language and educational expertise for a better educational experience. Moreover, the language technology required for such tasks is resource heavy, both for natural language analysis [17,18,19], speech synthesis [20,21,22,23] and dialogue [24].

Social norms, such as politeness, are also important factors for the user acceptance of the chatbot [25]. Lee and Choi (2017) measured how a chatbot for movie recommendation established relationships with human users using social communication processes, such as self-disclosure and reciprocity [26].

Cuayahuitl et al. (2019) used deep reinforcement learning using dialogue data to train chatbots [27]. Their human user evaluation showed that the chatbot phrases that were similar to human natural language had high acceptance and triggered engagement. To make chatbot personas more accurate and, therefore, the chatbots more acceptable, data-driven design can be used for creating the chatbot personas as well as the service matching to these personas [28]. Such requirement is important in situations where the chatbot is an all-day companion or partner, such as an emotion-aware wellbeing chatbot [29]. Another recent user experience evaluation study suggests that users expect action cues when interacting with a chatbot [30].

Semantics in general can be utilised to aid the design of the chatbot. Recent works used situational characteristics to tailor the design to specific context [31]. Other works utilised chatbots in settings that traditionally require expert natural language control, such as journalism [32]. In open domain or expert situations, chatbots may succeed in the main tasks they are designed for, but it is a situation where they may also fail to satisfy the natural communication requirements by the users, as in the case of the expert recommender chatbot for Discord failing to provide the conversational behaviour expected by the users [33].

3 Experiment Setup and Methodology

A total of 12 university students (3 female, 9 male), participated voluntarily in the laboratory. The participants were recruited through online-advertisement and the university web forum as well as email invitations. The participants were compensated with Amazon credit vouchers. The participants ranged in age from 19 to 36 with an average age of 26.45 years (SD = 3.48). Regarding computer literacy, they reported a mean experience of 13.32 years (SD = 3.30) and 5.02 h of daily usage (SD = 1.84).

Three chatbot designers were recruited from the human computer interaction lab of the university. They all had degrees in computer science and proven expertise in designing chatbots for postgraduate level computer-human interaction course laboratory exercises. The designers and the users were briefed about the aim and goals of the study and they informed each other on the design idea.

This experiment involved the chatbot designers to design chatbots (all for the same domain of application, a museum) based on the general semantics of purpose. The purpose was chosen between (a) exhibit presentation, (b) exploration and (c) learning. The design itself was a designer choice, each designer selecting a purpose for their design, and using the same chatbot design technology.

The designers provided their vision of the chatbot purpose and the design, which was recorded. Similarly, the users provided their own expectations for chatbot design purpose, for each of the types above. The interpretation of the semantics of the purpose between designers and users was investigated to measure their similarity. The semantics of purpose were also the basis for setting the user expectations and their matching to the design requirements. The summary of the designer and user understanding of the semantics of purpose is provided in Table 1.

Table 1. Designer aims and user expectations.

The designers had a technical-oriented approach towards the chatbot aims. First of all, they established the type of dialogue the chatbot should have to be able to achieve the core purpose. They opted for chatbot-directed dialogue for exhibit presentation and learning and mixed initiative dialogue for exploration. Regarding the latter, they envisioned that exploration would be open for the user to steer, based on the user input, questions and directions, as well as the chatbot recommendations for similar or semantically linked artifacts. The designers also reported on the core design decisions on how to implement each design. For example, the learning scenario would have to account for advanced information to be provided to the user by the system, especially when there are follow up questions by the users.

The users also discussed and agreed on their main expectations for each purpose. The aforementioned example is also valid for the user perspective, in which case the users expected to pose questions to the system during their interaction for learning about an exhibit. Therefore, they also expected to have advanced information available to them.

The designers, informed about the user requirements, designed the chatbots, that is the chatbot conversational interfaces. For the purpose of this work, they designed short interaction scenarios and not complete designs.

Fig. 1.
figure 1

Chatbot design for exhibit presentation.

Figure 1 depicts a screenshot from the design environment for the exhibit presentation. In this scenario, the user is interested in the Venus de Milo, the statue of Aphrodite of Milos, now on permanent display in the Louvre Museum in Paris. The chatbot makes the introductions and asks the user if they are interested to find out more about Aphrodite.

The designers provided a breakdown of their design principles which were communicated to the participants in a focus group setting. The outcome was recorded for use in the formal evaluation.

4 Evaluation

All 12 participants took part in the evaluation sessions where they immersed in the three semantic-driven chatbot scenarios. The scenarios were presented at random, two for each participant. The participants completed the conversation and provided subjective feedback through online evaluation questionnaires. The aim was to see whether the semantics of purpose, as perceived by the users and the designers, matched their expectations of the chatbot-human interaction. Moreover, the participants were asked to rate the user experience in terms of friendliness and acceptance.

The first point of interest was from the user feedback regarding the type of chatbot. Each participant interacted with two (out of three) scenarios and reported on the perceived purpose of the chatbot. The reason for the user interaction with two rather than all three scenarios was to avoid plain reasoning and deduction of the third scenario.

Fig. 2.
figure 2

One of the simplest and accurate ways to introduce an exhibit, according to the user feedback. Does it provide a hint as to the purpose of the chatbot?

Fig. 3.
figure 3

User-perceived purpose after interaction

Figure 2 shows a commonly used, simple and usable way to start a friendly conversation. At this point the actual dialogue is still in the initial stage and the users do not have enough information to differentiate between the possible scenarios.

Figure 3 shows how the users perceived the purpose of each chatbot scenario. The presentation was perceived accurately by 50% of the participants, while three of them (37.5%) perceived it as learning. The justification given by the users was that the presentation had a lot of information about the exhibits making it quite accurate to their expectations. On the other hand, the learning was perceived accurately by 75% of the participants, while the remaining two reported one of the other purposes each. The justification that was given by the designers was that the learning chatbot interaction failed to trigger the rule-based intelligence many times for the two participants and, therefore, did not provide even the detailed information and show its purpose. The exploration was perceived accurately by all participants. All parties agreed that the expectations were met. The designers were convinced that the choice of mixed initiative dialogue was an apparent reason for the participants to perceive the purpose as it was intended.

The users provided feedback on the friendliness and overall acceptance of the chatbot design. In order to gain insight into the way that semantics of purpose may be relevant to design and acceptability of chatbots, we compared the user feedback after the interaction. The comparison was in terms of user-reported friendliness and overall acceptability evaluation between the designer and the user-perceived purpose distribution. Friendliness was selected since it was one of the main characteristics that help conversational agents achieve their goals as well as a key component to user satisfaction.

Fig. 4.
figure 4

User feedback aggregated by purpose from the designers

Figure 4 shows the average user evaluation scores (Likert scale 1-5) for each design (as designated by the designers) regarding friendliness and acceptance. The exploration design achieved the highest scores, while the other two were quite close between them.

Fig. 5.
figure 5

User feedback aggregated by user perceived purpose

Figure 5 shows the average user evaluation scores (Likert scale 1-5) for each design (as perceived by the users) regarding friendliness and acceptance. The exploration design achieved the highest scores, although it aggregated the scores of the designer-intended exploration designs (all users agreed with the designers on the exploration purpose assignment) plus two instances (one presentation and one learning) where the users reported them as exploration purposed.

The main perceptual difference between the user-perceived and the designer-intended designs can be seen in Fig. 5 for the presentation and learning user evaluation scores. The distance between the scores is much larger, presentation had lower acceptance and higher friendliness. It was the other way around for learning. This shows that the users perceived friendliness as a presentation attribute, while learning was attributed with a more formal communication quality.

The above indicate that semantics of purpose can be perceived differently between designers and users and that may lead to subtle expectations beyond the formal requirements.

5 Conclusion and Future Work

This work reported on the results of an experimental study for chatbot design perceived purpose. Designers and users shared their requirements and expectations, but the semantics were comprehended from their distinct points of view. The main task of the participants was to report their perceived purpose of the chatbot from their interaction. The reported evaluation showed that for users and designers alike, the formulation of acceptance and design criteria, respectively, is manifested on a semantic level.

The limitations of this work are mainly identified on the low number of participants and the distinct use of design and evaluation for the purpose of the experimental study. Formal evaluation of conversational systems requires rigorous protocols and high number of participants in appropriate experimental settings. The choice of evaluating overall acceptance and friendliness, instead of a meticulous usability evaluation using standard tests, was conscious and deliberate, since the designs were not finalized or field tested, but rather specifically produced for the study.

This work may find use in chatbot design. Specifically, the user perception of purpose and the parameters that may affect that could be useful for the design of socially-adaptive chatbots [34] and works in semantics for communication [35, 36].

Future work includes the use of the proposed methodology with recommender systems [37,38,39] and especially combination with collaborative filtering techniques [40,41,42,43,44]. Finally we are planning the proposed approach to be incorporated in social and tourist related recommendation applications [45,46,47,48,49].