Keywords

1 Introduction

With the rapid development of online social media like Weibo and Weixin in China, consumers’ opinions on products are being exchanged in unprecedented scale and detail. This online word of mouth (WOM) has the potential to influence both firms’ product design strategies and consumers’ purchasing decisions (Chen and Xie, 2008) (Zhang etc., 2009) (Lau etc., 2014). Therefore, techniques of opinion mining rise in response to the requirement of retrieving useful information in speed.

Consumers always give comments on specific features of a given product in online reviews, thus the collocation of a product feature (such as ‘屏幕(screen)’ and ‘外观(appearance)’ of a cellphone) and its corresponding opinion (such as ‘好(good)’ and ‘差(poor)’) can be considered as the key component for extracting consumers’ feedbacks on the product from online reviews. The pair < feature, opinion > is seen as the basic appraisal expression unit of online reviews, and hence its identification is the basic object and the fundamental task in opinion mining. Different from English language, Chinese online reviews are characterized by their terseness in language, vagueness in semantics and complexity in syntax. The style of language and the ways of expression in Chinese greatly increase the difficulty of opinion mining. Therefore, this paper aims at identifying appraisal expressions in Chinese online reviews.

The remainder of this paper is organized as follows. We review the literatures in ‘Literature review’ section and introduce the method of appraisal expressions identification in ‘Proposed Approach’ section. We conduct comparative experiment on cellphone online reviews in Chinese and apply the method to a comparative evaluation of two cellphones in ‘Experiment and Application’ section. We conclude the paper with possible further research in ‘Conclusions’ section.

2 Literature Review

In the extant researches of appraisal expressions identification, there are mainly two kinds of methods: statistic orientation and semantic orientation.

The statistic orientation method extracts features at first, and identifies opinions occurred in the vicinity. For example, Hu and Liu (2004) utilized Association Rules to extract frequent nouns and noun phrases as product features, and identified the adjectives closest to each feature as its opinions. Su, etc. (2008) proposed a mutual reinforcement method to analyze the hidden sentiment association between noun features and adjectival opinions. Zhang, etc. (2010) identified product features based on conditional random fields (CRFs) and identified the corresponding opinions based on syntactic tree. With the rapid development of techniques in statistics and probability theory, some researches brought the state-of-art probabilistic modelling into opinion mining, and the polarity of text is detected at a much finer-gained word level by computing the probabilistic measures of word association. For example, Titov and McDonald (2008) proposed a Topic-Sentiment Model (TSM) along with Lin and He (2009) presented a Joint Sentiment/Topic (JST) model to jointly detect topic and predict sentiment at document level. Jiang, Meng and Yu (2011) analyzed the change of topic sentiment based on Probabilistic Latent Semantic Indexing (PLSI).

However, the statistic orientation method only identifies high-frequent nouns and noun phrases as product features, and neglects low-frequent terms and phrases in other parts-of-speech (e.g. verbs). Moreover, a product feature and the corresponding opinion are not always close to each other and the opinion identified may not be the one holding toward the feature, due to missing punctuation mark or an implicit feature not showing in the text, thus this method is somewhat heuristic.

The semantic orientation method explores linguistic knowledge, such as language pattern, syntactic relationship and sentiment lexicon, to identify appraisal expressions in online reviews. For example, Popescu and Etzioni (2005) extracted nouns and noun phrases as product features at first, and then utilized 10 extraction rules to identify opinions based on the syntactic dependencies. Wilson, etc. (2005) developed an Opinion Finder system to identify opinion and its targeted feature based on hand-crafted sentiment lexicon. Zhuang, Jing and Zhu (2006) manually selected features and opinions in movie reviews from WordNet, and used dependency grammar graph to detect appraisal expressions. Bloom, Garg and Argamon (2007) extracted adjectival opinions based on a hand-built lexicon, and identified their corresponding features according to the predefined 31 syntactic rules. Yao, etc. (2008) manually created 278 product features and took an opinion as a chunk of information consisting of three slots < subject, attribute, value >. Miao, Li and Zeng (2010) took the high-frequent Nouns as product features and the high-frequent Adjectives, Adverbs and Verbs as opinions at first, and then identify appraisal expressions based on manually pre-defined syntactic rules. Zhao, etc. (2010) applied automatically selected syntactic paths to detect appraisal expressions, with the help of edit distance based path matching method. Vu et al. (2011) extracted product features and opinions based on Vietnamese syntax rules and synonym in VietSentiWordnet dictionary. Qiu et al. (2009, 2011) proposed a semi-supervised method that began with an initial opinion lexicon consisting of manually selected opinion word seeds, and extracted new sentiment words based on the relations between opinions and features described in dependency trees. Somprasertsri and Lalitrojwong (2010) also built a domain knowledge base to save the information like synonyms of features and sentiments of opinions within a certain domain. Lee and Bradlow (2011) applied a constrained-logic program to simultaneously cluster phrases referring to the same product feature and to discover the underlying properties of a given product feature.

Compared with the statistic orientation method, syntactic rules contained more useful linguistic knowledge than the association rules used in the statistic orientation method, thus the semantic orientation method gets better precision. However, since these rules are always applied to the text with simpler and regular grammars, the method is low in recall rate, especially in the scenario of Chinese online reviews with complex syntax.

Therefore, this paper proposes a semantic lexicon-based method to identify appraisal expressions in Chinese online reviews by thoroughly analyzing the ways of expression and the style of language in Chinese.

3 Proposed Approach

3.1 Basic Procedure of the Proposed Approach

The proposed approach follows the trend of the semantic orientation method, but from a different perspective. Unlike previous methods utilizing language patterns or syntactic rules to identify appraisal expressions, this approach semi-automatically built semantic lexicons by analyzing the words and their semantic relationships like synonyms in the manually labeled corpus. In this way, the various verbal expressions of product features are identified correctly, the semantic ambiguity is removed efficiently and the missing subjects are supplemented by the semantic lexicons. The basic procedure of the proposed approach is shown in Fig. 1.

Fig. 1.
figure 1

Basic procedure of appraisal expressions identification

3.2 Step 1: Word Segmentation and POS Tagging

Traditional document pre-processing procedures such as stop word removal, word segmentation and Part-of-Speech (POS) tagging are invoked to pre-process product online reviews. This process is to split text into sentences and to produce the part-of-speech tag for each word (whether the word is a noun, verb, adjective or adverb). The following shows a sentence with POS tags.

“外观/n时尚/a,/w全/a触摸/v屏/n,/w用/v的/u方便/a。/w” (Fashionable/a appearance/n,/w all/a touch/v screen/n,/w convenient/a for/u use/v./w).

Each sentence is saved in the review database along with the POS tag information, and a transaction file is then created to save notional words only including nouns, verbs, adjectives and adverbs in the sentence.

3.3 Step 2: Labeling Product Features and Opinions

Product features are the words and phrases describing the components, the functions and the properties of a given product (Ding, Liu and Yu, 2008). Unlike the previous researches only take nouns and noun phrases as product feature (e.g. Popescu & Etzioni, 2005), this paper asserts that product feature not only includes the nouns representing component, appearance, function or performance of a given product, but also contains the verbs describing behaviors in the use of the product. For example, in cellphone reviews, the verb ‘操作(operate)’ is a behavior of using cellphone.

Opinions are the words and phrases used to evaluate subjectively particular product features. It is believed that sentiment words include not only Adjectives and Adverbs but also Nouns, Verbs and etc. (Pang and Lee, 2008). Thus this paper takes adjectives, nouns, verbs or adverbs as opinion words. Therefore, the semantic lexicon containing product features and opinion words is established to identify appraisal expressions in the set of notional words. Keep the candidate word if a match is found in the lexicon and mark it with F (Feature) or O (Opinion).

3.4 Step 3: Refining Product Features and Opinions

1. Reducing Redundancy. In Chinese online reviews, product features have various verbal expressions, thus this paper proposes four rules to reduce the redundancy of the feature set.

  1. 1.

    Integrating semantic synonyms. For example, in cellphone reviews, ‘价格(price)’, ‘价值(value)’, and ‘价钱(expense)’ have the same meaning, thus they are integrated into one feature ‘价格(price)’;

  2. 2.

    Integrating contextual synonyms. For example, in cellphone reviews, ‘存储卡(storage card)’, ‘扩展卡(expansion card)’ and ‘SD卡(SD card)’ are semantic homonyms but refer to the same feature ‘记忆卡(memory card)’, thus they are integrated into one feature ‘记忆卡(memory card)’.

  3. 3.

    Integrating specific features as one general feature. For example, in cellphone reviews, ‘触屏(touch screen)’, ‘主屏(main screen)’ and ‘电容屏(capacitive screen)’ all belong to the general feature ‘屏幕(screen)’, thus they are integrated into one feature ‘屏幕(screen)’.

  4. 4.

    Integrating function features and their corresponding behaviors as one feature. For example, in cellphone reviews, ‘播放(broadcast)’ is the behavior of ‘音响(loudspeaker)’, thus they are integrated into one feature ‘音响(loudspeaker)’.

2. Removing Ambiguity. Some components or functions of a given product may share the same attribute, such as each part of a cellphone has the ‘质量(quality)’ attribute. These common attributes usually appear in a sentence without the specific determiner to describe their hosts, thus cause semantic ambiguity. For instance, if a review only states ‘质量好(quality is good)’ may lead to a question: which quality is it? Therefore, in order to remove semantic ambiguity in Chinese online reviews, this paper presents a matching rule based on the co-occurrence of the attribute and its determiner.

In the field of information retrieval (IR), mutual information (MI) is commonly-used to compute the co-occurrence of two words. However, this algorithm ignores the implicit relationships between the two words. Therefore, this paper introduces the variant of the expected mutual information from the research of Lau et al. (2009a, 2009b), the balanced mutual information (BMI), to calculate the co-occurrence of word \( W_{i} \) and word \( W_{j} \), which considers both words’ presence and absence as the evidence of an implicit association. Furthermore, a windowing process is conducted to filter noisy terms, for the attribute and its determiner is usually near each other, and the closer word \( W_{i} \) and word \( W_{j} \) is, the stronger relationships they have.

$$ \begin{aligned} & BMI\left( {w_{i} ,w_{j} } \right) = \\ & \beta \times \left[ {Pr(w_{i} ,w_{j} ) \times \log_{2} \left( {\frac{{Pr(w_{i} ,w_{j} )}}{{Pr(w_{i} )Pr(w_{j} )}}} \right) + Pr(\neg w_{i} ,\neg w_{j} ) \times \log_{2} \left( {\frac{{Pr(\neg w_{i} ,\neg w_{j} )}}{{Pr(\neg w_{i} )pr(\neg w_{j} )}}} \right)} \right] - (1 - \beta ) \times \\ & \left[ {Pr(\neg w_{i} ,w_{j} ) \times \log_{2} \left( {\frac{{Pr(w_{i} ,\neg w_{j} )}}{{Pr(w_{i} )Pr(\neg w_{j} )}}} \right) + Pr(\neg w_{i} ,w_{j} ) \times \log_{2} \left( {\frac{{Pr(\neg w_{i} ,w_{j} )}}{{Pr(\neg w_{i} )pr(w_{j} )}}} \right)} \right] \\ \end{aligned} $$
(1)

A virtual window of \( \sigma \) words is moved from left to right one word at a time until the end of each document. According to previous researches (Lau et al., 2009a, 2009b), a text window of 5 to 10 terms is effective. Due to the long and colloquial expressions in Chinese product reviews, we take 8 terms as the size of the text window \( (\sigma = 8) \). In Eq. (1), (\( Pr(w_{i} )\,(Pr(w_{i} )) = \frac{{N(\sigma_{{w_{i} }} )}}{N(\sigma )} \), where \( N\left( {\sigma_{{w_{i} }} } \right) \) is the number of windows containing the word \( W_{i} \) and \( N(\sigma ) \) is the total number of windows obtained from a document) denotes the probability that word \( w_{i} \) appears in the text window. Similarly, \( Pr(\neg w_{i} ) \) denotes the probability that word \( W_{i} \) doesn’t appear in the text window. \( Pr\left( {w_{i} ,\,w_{j} } \right) \) \( (Pr(w_{i} w_{j} ) = \frac{{N(\sigma_{{w_{i} ,w_{j} }} )}}{N(\sigma )} \), where \( N\left( {\sigma_{{w_{i} ,w_{j} }} } \right) \) is the number of windows containing both \( W_{i} \) word and \( W_{j} \) word) denotes the joint probability that both words are present in the text window. Similarly, \( Pr\left( {\neg w_{i} ,w_{j} } \right) \) denotes the joint probability that both words are absent in the text window, and \( Pr\left( {\neg w_{i} ,w_{j} } \right) \) or \( Pr\left( {w_{i} ,\neg w_{j} } \right) \) denotes the joint probability that only one of them appears in the text window. The parameter \( \beta \in [0.5,\,0.7] \) was used to adjust the relative weight of positive and negative evidence respectively. After computing the co-occurrence of word \( W_{i} \) and word \( W_{j} \), a linear normalization \( (ass_{normal} = \frac{{ass - ass_{\hbox{min} } }}{{ass_{\hbox{max} } - ass_{\hbox{min} } }} \in [0,1]) \) is carried out to maintain all values in the interval of 0 and 1.

3. Solving Word Deficiency. In Chinese online reviews, the subjects are sometimes missing but implied in the context. These missing subjects are regarded as implicit features of the product (Ding, Hu and Yu, 2008). For example, a review states ‘有点重(a little heavy)’, the missing subject ‘重量(weight)’ is implicitly indicated by ‘重(heavy)’. This paper identifies and supplements the implicit features with the help of the contexts. There are two kinds of opinions words. The ones with clear and definite meanings are regarded as feature indicator (Ding, Hu and Yu, 2008), which are used to evaluate only a finite number of product features, such as ‘便宜(cheap)’ indicates ‘价格(price)’ and ‘重(heavy)’ indicates ‘重量(weight)’. While the others with general meaning are regarded as general opinion, which are used to appraise all features, such as ‘好(good)’ and ‘差(bad)’. Based on these kinds of opinion words, the word deficiency will be solved by matching the implicit features with the indicators and the general opinions with the closest feature in the same clause.

3.5 Step 4: Mapping Product Features to Review Features

In order to summarize all customer reviews of a product, product features are further gathered and mapped to review features, which are the high-profile features being mentioned a lot in online reviews. The mapping rules between product features and review features are determined by their semantic relations. Three semantic relations are analyzed in this paper, attribute-to-host, part-to-whole and event-to-role.

  1. 1.

    Attribute feature is mapped to its corresponding host feature. For example, in cellphone reviews, ‘颜色(color)’, ‘分辨率(resolution)’ and ‘亮度(brightness)’ are attributes of ‘屏幕(screen)’, and the comments on these attributes are equal to the comments on screen, thus these features are all mapped to ‘屏幕(screen)’.

  2. 2.

    Component feature is mapped to its corresponding whole feature. For example, in cellphone reviews, ‘耳机(earphone)’, ‘记忆卡(memory card)’ and ‘数据线(data line)’ all belong to ‘配件(accessory)’ (they are the accessories of cellphone), and the comments on these components are equal to the comments on accessory, thus these features are mapped to ‘配件(accessory)’.

  3. 3.

    Some feature representing user’s perception of using product is mapped to its corresponding behavior feature. For example, in cellphone reviews, ‘实用性(utility)’ and ‘操作性(operability)’ are both the perceptions of operating cellphone, thus they are mapped to ‘操作(operate)’.

3.6 Step 5: Identifying Appraisal Expressions

After labeling product features and opinions in step 2, four types of appraisal expressions are recognized as follows.

FO/OF represents a single product feature and a single opinion word. For example, the appraisal expression of “屏幕/F大/O” is < screen/F, big/O >.

FFO represents a group of multiple product features and a single opinion word. There are two kinds of relation among these product features. (1) Father-child relation. The meaning of a Father-child phrase is conveyed by the child feature, such as “手机/F操作/F方便/O (cellphone/F operates/F conveniently/O)”, where ‘operates’ is a behavior of using cellphone, so the appraisal expression is < operates/F, conveniently/O >. (2) Coordinating relation. For example, “外观/F和操作系统/F都不错/O (appearance/F and operating system/F are good/O)”, where the preposition ‘and’ indicates a coordinating relation between ‘appearance’ and ‘operating system’, so the appraisal expression are < appearance/F, good/O > and < operating system/F, good/O >.

FOO represents a group of multiple opinion words and a single product feature, and each opinion word can be regarded as the evaluation of the product feature. For example, “屏幕/F大/O而清晰/O (the screen/F is big/O and clear/O)”, where ‘big’ and ‘clear’ are both appraising ‘screen’, so the appraisal expressions are < screen/F, big/O > and < screen/F, clear/O >.

FFOFOO represents a combination of FFO pattern and FOO pattern, so we take the longest sequence starting with ‘F’ and ending with ‘O’, and divide the sequence into several FFO and FOO.

4 Experiment and Application

4.1 Corpus

The experiment is designed based on cellphone’s online reviews, in order to test and verify the effectiveness of the proposed approach. The corpus is obtained from the most popular E-commerce website in China, Taobao.com, in which 1000 pieces of reviews are taken as training corpus for lexicon establishing, and the other 1000 pieces of reviews are taken as the testing corpus.

In this experiment, a natural language processing tool, ICTCLAS (Chinese Lexical Analysis System researched by Institute of Computing Technology, Chinese Academy of Sciences) is employed for word segmentation and POS tagging of each review. And we replace all the punctuation mark with the comma.

Two researchers manually identify and label the product features, opinion words and appraisal expressions in training corpus. In order to reduce the subjectivity deviations in the labeling process, 20 pieces of corpus are selected randomly and the value of statistic Kappa are computed to test the consistency of labeling results. The Kappa value is 0.72, higher than 0.7, demonstrating an acceptably stable result. The labeling result is shown in Table 1.

Table 1. Labeling result of training corpus

Table 1 demonstrates that about 96.3 % of online reviews contain appraisal expressions, indicating that appraisal expressions are the key component of online reviews. Besides that, the amount of opinion words is bigger than that of appraisal expressions, indicating the existence of implicit features, which are the product features (subjects) missing in reviews. And the amount of product features is much bigger than that of opinion words, indicating that the type FFO (multiple product features with a single opinion word in one sentence) is much more popular than other types of appraisal expressions. Furthermore, the average numbers of product features, opinion words and appraisal expressions are more than 2 in each sentence, illustrating that the training corpus contains relatively rich information and is suitable for establishing semantic lexicons.

After labeling, the repeated product features and opinions words have to be deleted to build lexicons and the final result of features and opinions are shown in Table 2.

Table 2. Result of features and opinions

Table 2 demonstrates that over 95 % of product features and opinion words repeat in all online reviews, indicating that users prefer using regular words in their reviews of a particular product, thus the number of product features and opinion words are limited and the lexicons established upon these reviews are useful in appraisal expressions identification.

4.2 Evaluation

In this paper, recall rate (R), precision rate (P) and F-score (F) are employed to measure the performance of the proposed approach: P = |A∩B|/|A|; R = |A∩B|/|B|; F = 2*P*R/(P + R), Where A denotes the set of appraisal expressions being identified by the algorithm, and B denotes the set of appraisal expressions being labeled by hand.

4.3 Comparative Experiment

A comparative experiment is conducted based on the cellphone online reviews in Chinese and among the approach proposed in this paper and two baselines including a statistic orientation method and a semantic orientation method.

Baseline 1: method based on Association Rules and Nearest Principle (Hu and Liu, 2004b): at first, Association Rules are applied to identify the frequent items in the set of nouns and noun phrases as product features. Then for each feature, extract the adjectives closest to the feature as opinion words. At last, use the identified opinion words to find and extract the low-frequent features.

Baseline 2: method based on syntactic rules (Popescu and Etzioni, 2005): firstly, extract high-frequent nouns and noun phrases as product features. Then 10 extraction rules are used to find the heads of potential opinion phrases, and each head word together with its modifier is returned as a potential opinion phrase. At last, calculate the sentiment orientation of each candidate opinion based on HowNet, and these with distinct polarity are determined as opinion words.

Table 3 lists the comparative performances between our semantic approach and the two baselines, which shows that an obvious improvement in obtaining a more accurate result is achieved by our approach.

Table 3. Comparison on the performances of proposed approach and two baselines

Table 3 demonstrates that the performance of the first baseline is the lowest in all three measurements. That’s because the procedure of obtaining features and opinions is empirical and random, without any deep syntactic or semantic analysis. In contrast, the two semantic orientation methods (the second baseline and our approach) perform better, indicating that extracting deep syntactic and semantic relationships between features and opinions are more useful than simply considering the frequency and the location of them.

In addition, our approach outperforms the second baseline, especially in the recall rate, due to the following three reasons: (1) only high-frequent product features are extracted by the second baseline, omitting features with low-frequency; (2) these frequent nouns and nouns phrases may include non-product features like ‘配送速度(delivery speed)’; (3) the syntactic rules are not suitable for review mining in Chinese, for Chinese online reviews are short, vaguely semantic and lack of syntactic standardizations.

4.4 Application

Moreover, the proposed approach is further applied to a comparative evaluation of two popular cellphones on Taobao.com, iPhone 5S and Nokia 1050. 100 pieces of online reviews are selected respectively and the sentiment polarity of each appraisal expression is determined manually. The comparative evaluation of the two cellphones is shown in Fig. 2.

Fig. 2.
figure 2

Comparative evaluations of the two cellphones

Figure 2 illustrates the following conclusions: (1) iPhone 5S gets remarkably more positive comments on phone, screen, camera and appearance than those of Nokia 1050, thus gains more favor from younger people. This conclusion consists with the fact that iPhone 5S is an entertaining mobile. (2) Nokia 1050 gets remarkably more positive comments on price, battery and operating than those of iPhone 5S. This conclusion consists with the fact that Nokia 1050 is a practical cellphone and more suitable for aged population compared to iPhone 5S.

5 Conclusions

In the perspective of the ways of expression and the style of language in Chinese, this paper presents an approach based on semantic lexicon to extract appraisal expressions in Chinese online reviews. The proposed approach overcomes the difficulties in mining short, vaguely semantic and complicated syntactic Chinese online reviews, by establishing semantic lexicons to identify the various verbal expressions of product features, remove the semantic ambiguity and supplement the missing subjects. A comparative experiment based on cellphone online reviews in Chinese is conducted in the research, and the result illustrates that the proposed approach outperforms the two baselines including a statistic orientation method and a semantic orientation method. Moreover, the method is applied to a comparative evaluation between two cellphones, indicating that opinion mining will help customers as well as manufacturers to know the strengths and weaknesses of products without manually going through reviews.

Further research will be conducted in the following aspects: (1) study the knowledge structure in online reviews and explore the deeper semantic relations among features and opinions with the help of ontology; (2) classify the sentiment polarity of the identified appraisal expressions automatically; (3) integrate opinion mining with statistic method to analyze consumers’ needs.