Keywords

1 Introduction

User-generated content (UGC) comes from regular people who voluntarily contribute data, information, or media that then appears before others in a useful or entertaining way, usually on the Web—for example, restaurant ratings, wikis, and videos. The use of such content has seen rapid growth in recent years, in part because it's fairly inexpensive to obtain (users normally supply it for no charge) [1]. Besides, the text in the contents contains rich emotional information, which also reflects the quality of the experience (QoE) of the product or service to a certain extent. And compared with traditional survey methods, UGC texts have the advantages of easy collection, low financial cost, high ecological validity, low sampling error, etc. [2]. As a result, researchers have done a lot of valuable research on how UGC is used to analyze the experience of products and services.

Emotional analysis of the text is an important research direction to explore how to apply UGC text to the evaluation of QoE. It aims to classify existing emotions in text into one (or more) of a set of pre-defined categories. To recognize emotions, this classification task typically relies on emotion lexicons or makes use of an existing machine learning classifier [3].

Emotion analysis of text requires careful modeling of text since words are associated with different emotions in different contexts with varying levels of magnitude, making the identification of words for document representation more challenging. And a lexicon containing these emotional words is called an emotional lexicon. The existing general purpose emotion lexicons (GPELs) include WordNet-Affect (WNA), EmoSenticNet (ESN), the NRC word-emotion lexicon, etc. While these lexicons perform poorly in modeling specific product user experiences, for example, "Glee" might normally connote joy, but would need to be assumed neutral in the context of a document corpus talking about the television series with the same name [4]. So, it is important to develop the domain specific sentiment lexicon (DSSL).

A lexicon in the automotive field based on the UGC of the automotive vertical website used a word2vec deep learning model to conduct emotional analysis on the UGC, built a set of evaluation items using automotive product expertise, and finally determined the weight of the evaluation items by using the analytic hierarchy process and the frequency and proportion of words corresponding to the evaluation index [5]. Another study extracted keywords from the text of comments on 75 sample mobile phones, calculated the Kansei tendency on the keywords by using the lexical similarity, and then extracted the features of the products. Finally, the BP neural network was used to predict the Kansei parameters of the products with the product features as inputs, and a good prediction effect was obtained [6]. These are all valid explorations for QoE evaluation using a domain specific sentiment lexicon.

In recent years, the development and pervasiveness of 5G networks have reduced the obstacles to bandwidth and network delay for cloud-based mobile services. Cloud gaming technology can theoretically provide users with a high-quality gaming experience on any terminal [7]. It is challenging to deploy network and computing power according to the dynamic network environment where cloud gaming players are located to provide reliable and stable QoE [8]. Reliable measurement is the premise for evaluating the QoE of cloud gaming (CGQoE) [9].

In view of the above problems, this study develops a domain specific sentiment lexicon for evaluating the quality of experience of cloud gaming. The main technical routes are as follows:

  1. 1.

    Extract feature words and statistics frequency, using CountVectorizer and transfer.fit_transform in sklearn.feature_extraction.text in Python, and vectorize each comment text using the feature words and frequencies.

  2. 2.

    Create a DSSL with feature words selected by the expertise of CGQoE.

  3. 3.

    Get the QoE evaluation data based on DSSL with the approach of Emotion Distribution Learning (EDL), which maps the emotion information contained in the cloud gaming-related text UGC into the emotion distribution represented by multiple emotion feature words.

  4. 4.

    Analyze the criterion validity of QoE evaluation data.

  5. 5.

    Filter and correct the dataset of comment texts based on the DSSL.

  6. 6.

    Analyze the lexicon’s discriminative power in differentiating the QoE of eight cloud gaming services.

The contributions of this study include:

  1. 1.

    Create a DSSL of CGQoE based on the emotional information in the UGC texts of cloud gaming and the existing expertise in the field of CGQoE.

  2. 2.

    Evaluate the QoE of comment text based on the DSSL and EDL approaches.

  3. 3.

    Analyze the potential of transforming DSSL into a psychometric scale of CGQoE based on criterion validity and discriminative power.

2 Related Work

2.1 Approach of Evaluating the CGQoE

The purpose of this study is to develop a DSSL from UGC texts, thereby evaluating the CGQoE. So, it is critical to obtain the emotional words included in the DSSL and verify the validity of the DSSL.

Inspired by the idea of questionnaire survey method with psychometric properties, each measurement item corresponds to a semantically unique evaluation index. Therefore, to achieve the purpose of this study, it is very critical to use the multiple emotional feature words extracted from UGC to structurally evaluate the QoE contained in each UGC text and measure the validity of the evaluation results.

In the classic single-label text emotion distribution approach, each piece of text information can only be represented by a single emotion tag [10], which is insufficient for analyzing the CGQoE via several structured items in the current study. The Emotion Distribution Learning (EDL) method can map one or more sentences or paragraphs of user-generated content to a distribution of numerous emotion tags or feature words. In this study, the EDL approach was used to characterize the cloud gaming QoE in UGC.

Some studies have shown that the emotional information in a sentence can be characterized by the emotional words contained therein [11], which provides the possibility of label emotion distribution based on the emotional feature words. The goal is to map the emotional information in a text into a vector space comprised of many emotional feature words, with each distribution labeling containing information about emotional intensity. Some researchers have discovered that the EDL approach based on six emotion feature words (joy, fear, anger, surprise, happiness, and disgust) outperforms other approaches in terms of the prediction performance of emotion and the prediction effect of single emotion feature words in the SemVal training set [3]. Since the separate emotion lexicon does not reflect the significance between emotion words, other scholars proposed the concept of an emotion wheel to strengthen the emotion distribution marking. The results demonstrate that this strategy outperforms other emotion distribution marking systems [10] in an emotion recognition test using seven Chinese and English text emotion datasets.

2.2 EDL Approach Based on the DSSL, Semantic Similarity, and Word Frequency

EDL's purpose in this study is to quantitatively evaluate the QoE of each UGC based on the created DSSL across multiple dimensions., that is, to map the emotional information in the UGC texts to the distribution of emotions \(d_{i}\) shown by several emotional feature words, so as to get the QoE evaluation result [10].

$$ d_{i} = \left\{ {d_{i}^{j} } \right\}_{j - 1}^{C} $$
(1)

\(d_{i}^{j}\) refers to the extent to which that text \(t_{i}\) in UGC can be represented by the emotional feature word j. This study examines two approaches of EDL:

  • Based on the frequency of feature words in UGC texts.

  • Based on the semantic similarity between feature words and UGC texts.

The EDL approaches proposed in this study include the following steps:

  1. 4.

    Use jieba.cut in Python to cut UGC text in Chinese and extract feature words.

  2. 5.

    Use the CountVectorizer in sklearn.feature_extraction.text counts the frequency of feature words after word segmentation

  3. 6.

    Use transfer fit_ Transform converts each text into a vector represented by a plurality of feature words (the number is 2 or more to meet the requirements of structural degree)

  4. 7.

    Based on the expertise of CGQoE, simplify the feature word base to make a word base that only contains QoE-related feature words

  5. 8.

    Use the sentiments in snowNLP to rate the emotion of each comment text

  6. 9.

    Use the sentiments in snowNLP to rate the emotion of each feature word extracted from the UGC

  7. 10.

    Characterization of emotion distribution:

  8. When the frequency approach is used, \(d_{i}^{j}\) is represented by the product of the frequency of the feature word j appearing in the text \(t_{i}\) and the emotion score of the feature word j, and finally all \(d_{i}^{j}\) forms an emotion distribution \(d_{f}\) of frequency approach.

  9. In the similarity approach, use the similarity in simtext to calculate the semantic similarity between the feature word j and the UGC text \(t_{i}\), that is, use the sentiments in snowNLP to rate the emotion of each comment text. \(s_{i}^{j}\) is represented by the product of the semantic similarity between the feature word j and the text \(t_{i}\), and finally, all \(s_{i}^{j}\) form an emotion distribution \(d_{s}\) of frequency approach. The results of two EDL approaches are tested in the experimental part.

3 Experiment

3.1 Dataset About Cloud Gaming

The first step is to achieve this study’s goal is to create a dataset due to a lack of a standard comment text dataset from UGC about CGQoE:

  1. 11.

    In the first step, crawl 147,386 pieces of comment texts and their corresponding 5-point Likert scale scores from four cloud game platforms. Comment texts are the dataset for EDL in this study, and the score on the 5-point Likert scale provided by the user is also gathered as a criterion for evaluating the reliability.

  2. 12.

    In the second step, clean the data by removing non-Chinese characters such as numbers, emoticons, English and garbled codes, etc., by the purpose of this study; Remove duplicate water army data, which means multiple users posting the same comment, and remove the information in the comment that has less than 7 characters (at least two feature words are included to make it easier to use multidimensional emotional words for structured text emotion distribution), leaving 123,678 pieces of comment texts.

  3. 13.

    The third step is to use snowNLP to score the emotion of the comment texts.

  4. 14.

    The fourth step is to classify the positive or negative emotion for each UGC text based on the emotion score, greater than 0.5 for positive emotion and less than 0.5 for negative emotion.

  5. 15.

    The fifth step is to classify the positive or negative emotion for each UGC text based on the scale score, greater than 3 for positive emotion and less than 3 for negative emotion.

  6. 16.

    The sixth step is to take the emotional classification result of the user's scale score in the fifth step as the criterion and remove the data that the classification result is inconsistent with the text emotional classification result in the fourth step, leaving 83644 pieces of text information.

After removing the text with inconsistent sentiment classification, the Pearson correlation coefficient between the snowNLP-based text emotion score and the user's scale score went from 0.46 to 0.91. This shows that the distribution of the emotional information in the text comments dataset can be completely matched by the distribution of the user's scale scores.

3.2 Creation of Cloud Gaming QoE DSSL

UGC text was segmented using the Jieba.cut tool and 2412 feature words were generated. Based on the expertise of CGQoE, the feature words unrelated to cloud gaming QoE are removed, and a DSSL of cloud gaming QoE including 62 feature words is created.

In this DSSL, 22 words were positive and 40 were negative (based on the emotion score) (Table 1).

Table 1. The DSSL of CGQoE

3.3 Filtration of Dataset Based on the CGQoE DSSL

When the frequency approach was first used for emotion distribution labeling, it was revealed that the correlation between labeling results and text emotion scores was just 0.002, which was not statistically significant. After analyzing the text content, it was determined that the discrepancy between the emotion distribution labeling of UGC text using cloud gaming QoE-related feature words and the emotion score of UGC text is because feature words unrelated to the CGQoE contained in UGC text by users, such as "fun," "enjoyable," and so on, are not used for emotion distribution labeling, and this study focuses only on QoE due to network quality of service (QoS). This is partly because the original UGC content has more rich, multidimensional, and adequate unstructured information, which is not enough to focus on the content related to the CGQoE that this study is about. Techniques and approaches for improving content validity must be explored when applying them to the structured measurement of a particular domain.

Consequently, the dataset is continually filtered with a focus on the study topic of QoE variation due to QoS. The dataset, which contained 83,644 items in total, was filtered using 62 feature words from the lexicon. Cases of data that did not contain any of the feature words were eliminated, leaving a total of 9,193 items in the new dataset.

3.4 Emotion Distribution Learning of UGC Based on DSSL

Based on the new dataset (9,193 items), comment texts are labeled with a structured emotion distribution. In the frequency approach, the emotion distribution score of each text is generated by multiplying the frequency of the feature word in the text by the emotion score of the feature word. The average correlation coefficient between the emotion distribution labeling results from the frequency approach and the emotion score of the text is 0.60.

In the similarity approach, the emotion distribution score of each text is generated by multiplying the semantic similarity between the 62 feature words in the CGQoE lexicon and each comment text by the emotion score of each comment text. The average correlation coefficient between the emotion distribution labeling results from the similarity approach and the emotion score of the text is 0.87.

In general, the results of both the frequency approach and the similarity approach for labeling emotion distribution were associated with high scores for emotion in UGC comment texts.

The frequency and similarity approaches for labeling emotion distribution were highly correlated with emotion scores in UGC comment texts.

The findings of the manual evaluation revealed that snowNLP wrongly evaluated the emotion of certain vocabulary feature words. There were cases where positive QoE words were incorrectly identified as negative or conversely, such as stuck (rated 0.50 by snowNLP as neutral emotion while it should be negative) and clearly (rated as 0.33 by snowNLP as a negative emotion, which should be positive emotion). Therefore, a manual modification was undertaken using the average synonym score as the approach. As a result of the modification, the recognition rate of textual emotion tendency increased to 88.92%.

3.5 Correction of Dataset Based on DSSL Labeling Results

Continue to explore the causes of the disparity between the emotion tendency of the scale score and the emotion score based on the results of emotion distribution labeling using the frequency approach. 1009 data points out of 9,193 were determined to have a consistency value of 0, indicating inconsistent emotional tendencies. Through manual evaluation, the causes of inconsistent emotional tendencies were analyzed, and the following results were obtained, as shown in Table 2.

Table 2. Reasons for the inconsistent emotional tendency
  • The 1st reason: The use of negative words, such as “no,” “won't,” and “rarely,” to modify the feature words results in opposite emotional tendencies. Therefore, it is required to determine if a negative word comes before or follows the feature word.

  • The 2nd reason: Because the feature words in the comment, such as "continuous avatar frame" and "used for a long time," are not used for describing the CGQoE, such comment text does not need to be modified; it may be deleted directly.

  • The 3rd reason: Since the feature word evaluates other situations to highlight the recent cloud game experience, such as “my network does not play other cards” and “I don't get stuck because your network is stuck,” such comments do not need to be modified, so they are deleted directly.

  • The 4th reason: This is often because the user mentions the disadvantages in the comment but considers the overall cloud gaming experience superior, resulting in a higher scale score, such as "I'm stuck, stuck at 99%, yet this game is still rather pleasant; recommended." In this statement, the user cites "stuck" as the worst part of the cloud game, although the scale score remains at 5. In such cases, the scale score must be modified to reflect the emotional trend of the comment texts about CGQoE.

  • The 5th reason: Because the comment itself is logically confused, such text comments are deleted immediately.

3.6 Result

After the above analysis process, a new dataset (6,692 items) was obtained after manual correction. In the similarity approach, the criterion validity of cloud gaming DSSL improved from (r = 0.87, p < .01) to (r = 0.88, p < .01), significantly correlated with users’ rating scores. And in the frequency approach, the criterion validity of DSSL-based cloud gaming QoE improved from (r = 0.60, p < .01) to (r = 0.65, p < .01), significantly correlated with users’ rating scores. The recognition rate of textual emotion tendency increased to 95.98%. It indicated that DSSL has good criterion validity.

To analyze the discriminative power of cloud gaming DSSL in differentiating the QoE of eight cloud gaming services, the sentiment score of each cloud gaming was calculated (Table 3). The sentiment scores grounded on DSSL of the eight cloud gaming services were consistent with that grounded on the sentiment of the whole sentence of review and rating scores. This result indicates that DSSL has comparable discriminative power with the sentiment grounded on the whole sentence of review and rating scores.

Table 3. Sentiment scores and rank of sentiment scores by different methods

4 Discussion and Future Work

This study attempts to develop a domain specific sentiment lexicon from UGC texts with abundant emotional information for evaluating the CGQoE, based on two EDL methods: the similarity approach and the frequency approach. It demonstrated the potential of transforming DSSL into a psychometric scale of QoE in cloud gaming, considering its high criterion validity and discriminative power. Compared with the single item rating score and sentiment score of a review, DSSL provides a structural distribution of the sentiment on 62 indicators. This study thus leaves room for future studies to develop a scale with 62 indicators and examine the factor structure using psychometric methods.