Keywords

1 Introduction

It is important for a service provider to increase satisfaction of all the users, especially, in the time when a problem occurs when the user uses the service. Meeting the user problems and providing more customer service than expected improves user satisfaction and competitiveness of the company [18]. Chatbots can answer customers’ inquiries cheaply, quickly and in real-time. In the field of CS, chatbots are mainly used to provide answers to repeated questions, and as a result, CS personnel are more practical and cost-effective in that they can give higher value answers to customers [8]. Thus, more and more businesses are choosing chatbots for customer service [19].

In UX design, there are attempts to improve user satisfaction and service completeness by providing a service that covers multimodal user [22]. Also, chatbot has evolved to provide an optimized response for the use [24]. However, the various users are too quick and massive to follow their problems with the present but popular research methods, in designing customer service, to extract key features that may improve user experience. The methods are surveys, self-reports, interviews, and user observation. They usually take a lot of time, effort, and cost. At the same time, the amount of data collection is restricted due to a small sample size and cognitive limitation. It is often not enough to model the actual behavior of responsive users that can be used for customer service customization.

This study is based on the case of Laftel, cartoon streaming service [11]. Laftel is a streaming service that recommends animation and webtoon based on user preference. The service provides contents based on interests with little expertise. Therefore, users change faster because service deviance rate is higher than service that provides professional content, and the main user base is in their 10 s and 20 s. Also, the service provider is a startup that requires an efficient but effective way for customer service so that we choose Laftel as our case.

In this study, we introduce a data-driven framework for designing customer service chatbot that utilizes the past customer behavior data from clickstreams and a customer service chatbot. We apply cluster analysis to user data and segments users to build personas. In order to create a service list of CS, the company’s CS data is processed by Natural Language Processing (NLP) to derive words with high frequency of use and words similar to those words. In result, we generate types of customer service chatbots for each personas.

This study suggests a way to provide corporate customer service effectively and effectively, and it is expected that it will contribute to the improvement of corporate value.

2 Literature Review

2.1 Data-Driven Personas

User experience (UX) design is to elicit positive experience by using designs customized to users. In UX design, persona is often used to understand users. Persona categorizes users based on their behavior, goals, needs, and context. Namely, persona is an artificial character that represents various user types in the population of the potential target users. Cooper first utilized Persona concept for Design and User Experience practice [7]. He considered a persona as an archetypal user. Pruitt and Grudin argued that persona is helpful to understand users and their needs because we can perceive user closely as a person [21]. Norman insisted that, in UX design, one can design the experience a person will have when the one empathizes the person totally based on persona [17].

Traditionally, in user-oriented design, personas are built based on collected data from surveys, self-reports, interviews and user observation. But this process generates limited amount of data compared to costs of labor, budget and time. In addition, there is a gap between users’ actual behavior and users’ realization of their behavior. Additionally, the traditional user research methods are insufficient to support flexible services for fast-changing industry due to 4th industrial revolution and responsive users.

2.2 Telemetry and Click Stream

Clickstream is a digital path of user through a web site. A series of web pages requested by a visitor in a single visit is referred to as a session. Clickstream data includes click path information that shows the goal of service uses and their associated information such as timestamp, IP address, URL, status, number of transferred bytes, referrer, user agent, and cookie data in real time. And thus, collecting and analyzing clickstream data is an effective and efficient way to know user behavior data compared to traditional methods.

We can predict user’s needs and the user’s behavior by analyzing clickstream data. And, in UX research, clickstream data is utilized in order to understand the users of a website and improve the quality of service [4]. Singh and Cancel used clickstream data to show users of a website have different needs for services and functions [26]. They also showed that the outcome of the service improves when they personalized web designs and product offerings based on a user’s path. Mobasher collected and analyzed clickstream data to design a personalized web page [16]. Xiang, Hans-Frederick and Anil made personas based on clickstream data and UX design methodology. They showed that it actually reflects the actual behavior of the users [28].

2.3 Chatbot

Chatbot is a computer program designed to perform certain tasks through communication with humans through text messages, combined with artificial intelligence and messenger functions. Gartner predicted that by 2021, more than 50% of companies will be managing AI-based chatbots within their apps [14]. Chatbots are suitable for providing answers to simple questions, and real-time answers are possible. Therefore, the use of chatbots in the CS field can reduce the labor cost and improve the CS satisfaction of the users because the CS consulting staff can use them in more productive fields [9].

There are two types of chatbots: open type and closed type. Closed chatbots are mainly used when certain functions are limited, or when there are not many data sets. This type of chatbot restricts the user’s questions so that the answer is more accurate, but it does not feel as much interaction with the user. Closed chatbots provide a relatively comprehensive service and are used when there are many datasets. This type of chatbot has a high degree of freedom for the user to ask questions, but the accuracy of the answer to a specific question is also low. However, it has the advantage of giving users a sense of interacting with Service. Recently, it is easy to see a mixed chatbot partially borrowing each form in order to take advantage of the closed type and the open type (Tables 1 and 2).

Table 1. Comparison of closed chatbot and open chatbot
Table 2. Chatbot by input method

In UX design, there are attempts to improve user satisfaction and service completeness by providing a service that covers multimodal user [22]. Also, chatbot has evolved to provide an optimized response for the use [24]. Makar and Allen studied an algorithm that passes different sentences by each personas in Chatbot Service [1, 15]. Liu classified user types based on postings posted by users, and studied Chatbot, which provides different sentences for each users [13].

3 A Data-Driven Design Framework for Customer Service Chatbot

Basically, we collect clickstreams as data from non-verbal user behavior and cluster them into a several groups that segments users. On the other hand, we collect the conversations of users with CS chatbots as data from verbal user behavior and classify them into a certain number of labels that follows a predefined category system. In this case, the system consists of services that a business provides. Lastly, each user group is defined by a combination of services so that the relationship between user groups and services are one-to-many relationships (Fig. 1).

Fig. 1.
figure 1

Data-driven design framework for customer service chatbot

3.1 Identify User Groups (Personas)

We build a persona using hierarchical clustering. Hierarchical clustering is a method of grouping targets based on their similarities using Euclidean distances and is especially useful when the total number of clusters is unknown. The process constructing personas with hierarchical clustering includes following steps.

figure a

3.2 Identify Service Types

We analyze the conversations of users with CS chatbots in Laftel to format the service provided by CS Chetbot. The process of classifying data and typesetting service type is as follows. In this study, the top 20 nominal words are defined as ‘key words’ and the top 10 words with high specific word and word vector values are defined as ‘related words’. The procedure for defining the service type is as follows. After proceeding step 1 and step 2, make the list in the table as shown in the Table 3. The main contents of the table are key words, the number of times key words are used, and related words of key words.

Table 3. Examples of key words and related words 

3.3 Distance Between Clickstreams

Users visited Laftel with ten routes we extracted twenty groups of clickstream data in total which are clickstreams of new visitors and re-visitors from the ten routes (Table 4). However, eight groups whose PVs are under fifty are excluded because of not enough data to analyze. The remaining 12 groups of clickstreams were labeled as Table 1.

Table 4. Example of correcting related words by word
figure b

Second, make a square matrix of the same number of related words between key words. Table b is an example of square matrix of Table a. In Table 3, word1 and word2 have two identical related words, ‘Inquiry’ and ‘(Monthly) fee’, so the value of (2,1) Finally, the square matrix is classified into n groups by H-Clustering and representative keywords representing each group are selected as shown in Table 5.

Table 5. Example of selecting representative keyword

3.4 Matching Service List with Persona

The service type defined in Sect. 3.2 and the persona defined in Sect. 3.1 are matched as shown in Table 6 of the receiver.

Table 6. Example of service type matching with person

4 Result

4.1 Collecting Clickstream Data

We used Beusable to track visiting users to Laftel. Beusable provides basic statistics such as page view, average residence time, dropout rate, device statistics, monitor resolution distribution, and access routes, click stream data by user types (new visit and re-visit). We concentrated on access routes and clickstream data of three weeks. During the three weeks, 30,000 page views and 15,000 unique views are collected.

4.2 Calculating the Distances Between Clickstream

Users visited Laftel with ten routes we extracted twenty groups of clickstream data in total which are clickstreams of new visitors and re-visitors from the ten routes. However, eight groups whose PVs are under fifty are excluded because of not enough data to analyze. The remaining 12 groups of clickstreams were labeled as Table 7.

Table 7. Page View for each type of access routes and user types

4.3 Selecting Representative Clickstreams Using H-Clustering

We computed the distances between twelve clickstreams using Euclidean distance and generated a n by n matrix as Table 2. The element at i-th row and j-th column represents the distance between i-th clickstream and j-th clickstream (Table 8).

Table 8. The distance between clickstreams

Figure 2 shows the result of hierarchical clustering of the matrix of Table 2. We found six clusters of clickstreams from the result. We regarded the clickstream with the highest PV in a cluster as the representative of the cluster. The access routes and the user types of the selected representatives are S3 (Direct-New), S4 (Direct-Return), S7 (Search-New), S8 (Search-Return), S10 (about.laftel.net-New), and S11 (msn.com/sprtan/ntp-Return).

Fig. 2.
figure 2

The result of H-Clustering

4.4 Mapping Clickstreams to Common Workflows

We mapped the coordinates where a certain number of users of the selected clickstreams stayed with the functional items in Laftel website. We also recorded the time of stay for each coordinate. And, we compared the trends of six clustered clickstreams each other. We discovered three personas: service explorers, soft users, and hard users as Table 9.

Table 9. The data-driven personas

Explorers.

These people visited a website through corporate introduction. They traversed the webpage as exploring services. And, they checked if the animations and the cartoons of their interests are provided. Also, they tried to know if purchase of the animations and the cartoons is allowed.

Soft Users.

These people came to a web page through a search engine. They tended to consume the animation and the webtoon what they have been consumed. Also, they searched for other contents that can be consumed with the present animation and webtoons. They tended to visit a website in a short time. Within the time period, they consume contents in 50% of the period and watch commercials in the rest.

Hard Users.

These people visits a web page through URL. They visit the web page to see the contents consumed before. They had a tendency to stay in the web page a long time, relatively. The 70% of the time is used for content consumption and the others are used for commercials and search).

4.5 Extract Key Words and Related Words

Laftel is a Korean language service. We use konlpy and kkma, which are Korean natural language processing tools, to find the frequency of words, and Word2Vec, which is a tool to assign word vector values to confirm the similarity between words, was used. As a result of the NLP analysis, ‘Key words’ are frequently used in the top 20 words, such as Payment (1910), Refund (1103), Point (792), Monthly (714), Purchase (637) Possibility (561), work (539), possibility (405), playback (325), video (278), animation (271), free (of charge) (266), Cancellation (256), Advertisement (243), Viewing (237), Confirm (234), Authentication (228), Cancel (227) and Publication right (225). Table 10 shows the related words for each key word.

Table 10. The key words and related words in CS List

4.6 Service Classification and Typing

We grouped words using the similarity of each word and defined the service type. Table 11 shows the similarity between words. The value is the number of the same words among the related words of two words in x and y.

Table 11. Similarity between keywords

The result of H-clustering the table is shown in Fig. 3. Based on the results, keywords of each service type are selected as shown in Table 12. There are two major service types, ‘Content’ and ‘Account’. There are 3 service types for each major category, 6 for each service type. Content Advertisement, Content consumption, Content etc., Account membership, Account-Authentication, Account benefit’. The key word of the first service type ‘Content Advertisement’ is ‘Advertisement’ and ‘Confirm’. The second service type ‘Content consumption’ key words are ‘Animation’, ‘Free (of charge)’, ‘Cancellation’, ‘Viewing’, ‘Authentication’, and ‘Cancel’. The third service type, ‘Content etc.’, is the ‘publication right’ key word. The key word in the fourth service type ‘Account membership’ is ‘Payment’, ‘Refund’, ‘Monthly fee’, ‘purchase’, ‘Inquiry’, ‘Consumption’. The key word for the fifth service type, ‘Account authentication’, is ‘possibility’, ‘Video’. The key words of the last service type ‘account benefit’ are ‘Point’ and ‘Work’.

Fig. 3.
figure 3

The result of H-clanging the service list

Table 12. The list of service type classifications and representative keywords selected

4.7 Matching Service List to Persona

Table 13 shows the service types and key words classified according to the needs of the persona. The explorer, a new user of Laftel Service, matched the ‘Content consumption’ service related to the information of the content provided by Laftel and ‘Account membership’ which is the service for the membership information. ‘Content consumption’, ‘Content advertisement’, ‘Account membership’, and ‘Authentication’ service, which is an authentication related service required at the initial stage of the account, are required for a soft user who is an existing user but has a relatively low service utilization degree, authentication’. For hard users who have a lot of service frequency and time, they match ‘content consumption’, ‘account membership’, and ‘account benefit’ which is an additional reward service for each account.

Table 13. Matching service type by persona

5 Conclusion

In this study, we introduce a data-driven framework for designing customer service chatbot. First, we used Beusable to collect clickstream data of Laftel, utilized hierarchical clustering to generate personas representing explorer, soft user, and hard user. In result, explorers visit the website to see if there are animations and webtoons of their interests as well as if they can be purchased. Soft users stay in a website in a short time. The 50% of the time is used for content consumption and the rest is utilized for commercials. Hard users spend a long time in a web site. The 70% of the time is used for content consumption and the rest is utilized for commercials and content search. Second, we defined the CS service type as NLP processing of corporate CS data. We extracted key words with high frequency of use and extracted related words that are close to vector distance from key word. We define that the distance between key words is proportional to the number of related words, and clustering key words by H-clustering the same number of related words. We grouped the service types into 6 groups, and grouped the 6 clusters into ‘Content’ and ‘Account’. The first group, Content, has 3 service types. ‘Advertisement’, ‘Consumption’ and ‘etc’. Also, the second group, Account, has 3 service types. ‘Membership’, ‘Authentication’ and ‘Benefit’. In result, we generate three types of customer service chatbots for each personas. Content consumption’, ‘Account membership’ and ‘Account authentication’ services for Soft users, ‘Content consumption’, ‘Content’, ‘Advertisement’, ‘Account membership’ and ‘Account benefit’ Service.

We confirmed the possibility of persona using the data through the literature review. Along with the study of Xiang, Hans-Frederick and Anil [28], this study showed a way to make persona using clickstream data of users. However, not every service can collect every single click stream. Rather, often, a collection of anonymous clickstreams can be accessible and retrievable using tools such as Beusable. And little streaming service users have been analyzed through clickstream data. Yet, Laftel is a popular streaming service that the public uses. And thus, our study can show the potential of data-driven design in general streaming services. We also confirmed the possibility of CS chatbot customized by person. However, previous studies have focused on answering the same answer with different sentences [1, 13, 15]. However, in this study, it is aimed to recognize service which is mainly used for each user and to provide optimized service for each persona.

This study is meaningful in that all of the methodologies used are data, and data processing is applied to UX methodology. It means that we quantify the usefulness of design based on user behavior data. In the basis of our result, Laftel can modify a CS service and validate the usefulness of our approach using A/B testing. We may increase the size of data and see the minimum number of data that can be useful enough for a service provider to have a meaningful result.

**All of the data used in the study are anonymous and there is no Problem to protect users’ privacy.