Keywords

1 Introduction

With the fast development of social network, more and more Chinese, especially young people, are enjoying the convenience brought by the social network. Take microblog for example, people have published various topics, such as entertainment news, political events, sports reports, etc. They express their various sentiment and opinions towards the topic with multiple forms of media. However, the unique features appear in microblog, such as the sparsity of topics, contact relation, retweet, the short message, the homophonic words, abbreviations, the network language (the popular words), emoticons, etc. These make it very difficult to analyze microblog’s topic and its sentiment.

To address the issues, a new cascade model, which excavates the topic of microblog and takes into account the relationship between the topic and its sentiment, is proposed. Our cascaded model aims to identify microblog topic and its sentiment more automatically and efficiently. It mainly has three distinctive advantages: (i) a novel MB-LDA model, which takes both contact relation and document relation into consideration based on LDA, is introduced to mining microblog topic, and the strong relationship between topic and its sentiment is considered in a model; (ii) attention network is introduced to identifying the factors that affect the topic’s sentiment and calculating the degree of influence of each factor; (iii) because both MB-LDA model and attention network are considered when using Bi-RNN to judge the sentiment towards the topic, the synchronized detection of the topic and its sentiment in microblog is achieved.

The rest of our paper will be organized as follows. In Sect. 2, we briefly summarize related works. Section 3 gives an overview of data construction, including the dictionaries of sentiment words, internet slang and emoticons. Section 4 gives an overview for cascaded model, including principles, graph models, related resources needed. The experimental results are reported in Sect. 5. Lastly, we conclude in Sect. 6.

2 Related Works

2.1 Topic Model

The present text topic recognition technologies mainly are: traditional topic mining algorithm, topic mining algorithm based on linear algebra, topic mining algorithm based on probability model. The traditional topic model can be traced back to the algorithm of text clustering, and it maps the unstructured data in the text into the points in the vector space by VSM (vector space model), and then uses traditional clustering algorithm to achieve text clustering. Usually text clustering has division-based algorithm, hierarchical-based algorithm, density-based algorithm and so on. However, these clustering algorithms generally depend on the distance calculation between the text and the distance calculation in the mass text is difficult to define; in addition, the clustering result is to distinguish the categories and doesn’t give the semantic information, it is not conducive to people’s understanding. LSA (latent semantic analysis) is a new method for mining text topics based on linear algebra, proposed by [1]. LSA uses the dimensionality reduction method of SVD to excavate the latent structure (semantic structure) of documents, and then we query and analyze correlation in low dimensional semantic space. By means of SVD and other mathematical methods, the implicit correlation can be well mined. However, the limitation of LSA is that it does not solve the “polysemous” problem of the text, because a word only has one coordinate in semantic space (that is the average of the word more than one meaning), instead of using multiple coordinate to express more than one meaning, and what’s more, SVD involves matrix operations, the computational cost is large, and the calculation results in many dimensions is negative, which makes the understanding of the topic is not intuitive.

The third topic model is generative probability model. It assumes that the topic can generate words according to certain rules. When text words are known, the topic distribution of text set can be calculated by probability. The most representative topic model are PLSA (probabilistic latent semantic analysis) and LDA (latent dirichlet allocation). Based on the study of LSA, PLSA is proposed by [2], which combines the maximum likelihood method and the generation model. It follows the dimension reduction of LSA: the text is a kind of high dimensional data when it is represented with TF·IDF, the number of topics is limited and the topic corresponds to the low dimensional semantic space, the topic mining is to project the document from the high dimensional space to the semantic space by reducing the dimension. LDA is a breakthrough extension of the PLSA by adding a priori distribution of Dirichlet on the basis of PLSA. The founder of LDA [3] point out that PLSA does not use a unified probability model in the probability calculation of the document corresponding to the topic, too many parameters will lead to overfitting, and it is difficult to assign a probability to a document outside the training set. Based on these defects, LDA introduces the super parameters and form a Bayesian model with 3 layers “document-topic-word”, and then the model is derived by using the probability method to find the semantic structure of the text and to mine the topic of the text.

In recent years, the research on topic model has been deepened, and a variety of models have been derived, such as Dynamic topic model [4], Syntactic topic model [5] and so on. There are also models that consider the relationships between texts, such as Link-PLSA-LDA and HTM (Hypertext Topic Model). Link-PLSA-LDA is a topic model proposed by [6] for citation analysis. In this model, the quoted text is generated by PLSA, and the citation text is generated by LDA, and the model assumes that the two has the same topic. HTM is a topic model proposed by [7] for hypertext analysis. In the process of generating text, HTM adds the influence factors of hyperlinks to mine the topic and classify the text for the hypertext.

2.2 Microblog Sentiment Analysis

Sentiment analysis is one of the fastest growing research areas in computer science, making it challenging to keep track of all the activities in the area. In the research domain of sentiment analysis, polarity classification for twitter has been concerned for some time, such as Tweetfeel, Twendz, Twitter Sentiment. In previous related work, [8] use distant learning to acquire sentiment data. They use tweets ending in positive emoticons like “:)” as positive and negative emoticons like “:(” as negative. They build models using Naives Bayes (NB), MaxEnt (ME) and Support Vector Machines (SVM), and they report SVM outperforms other classifiers. In terms of feature space, they try a Unigram, Bigram model in conjunction with parts-of-speech (POS) features. They note that the unigram model outperforms all other models. However, the unigram model isn’t suitable for Chinese microblog, and we make full use of new emoticons which appear frequently in Chinese microblog.

Another significant effort for sentiment classification on Twitter data is by [9]. They use polarity predictions from three websites as noisy labels to train a model. They propose the use of syntax features of tweets like retweet, hashtags, link, punctuation and exclamation marks in conjunction with features like prior polarity of words and POS of words. In order to improve target-dependent twitter sentiment classification, [10] incorporate target-dependent features and take the relations between twitters into consideration, such as retweet, reply and the twitters published by the same person. We extend their approach by adding a variety of Chinese dictionaries of sentiment, internet slang, emoticons, contact relation and document relation (forwarding), and then by using attention network and Bi-RNN to achieve the sentiment towards the topic.

The problem we address in this paper is to identify microblog topic and its sentiment automatically and synchronously. So the input of our task is a collection of microblogs and the output is topic labels and sentiment polarity assigned to each of the microblogs.

3 Data Description

Microblog allows users to post real time messages and are commonly displayed on the Web as shown in Fig. 1. “# #” identifies the microblog topic, “//” labels user’s forwarding relation (document relation), “@” specified the user who we speak to (contact relation).

Fig. 1.
figure 1

Chinese microblog example

People usually use sentiment words, internet slang and emoticons to express their opinions and sentiment in microblog. According to [11], the sentiment word is one of the best sentiment features representations of text, and the rich sentiment words can be conductive to improving sentiment analysis. Internet slang that more and more people use in social network is also important factor for polarity classification. The constructions of them are not only a significant foundation, but also a time-consuming, labor-intensive work. In order to obtain sentiment polarity on microblog topic, we use the same method to construct some dictionaries based on [12].

3.1 The Dictionary of Sentiment Words

In order to obtain more abundant sentiment words, we regard these sentiment words provided by HowNetFootnote 1 and National Taiwan University Sentiment Dictionary (NTUSD)Footnote 2 as the foundation, and then use lexical fusion strategy to enrich the dictionary of sentiment words.

[13] uses lexical fusion strategy to compute the degree of correlation between test word and seed words that have more obvious sentiment polarity, and then obtain sentiment polarity of test word. We respectively take 20 words as seed words in this paper, as shown in Tables 1 and 2.

Table 1. Seed words with positive polarity
Table 2. Seed words with negative polarity

So emotional orientation of the test word is computed as follows:

$$ SO(word)\, = \,\sum\limits_{pword \in Pset} {PMI(word,\,pword)} \, - \,\sum\limits_{nword \in Nset} {PMI(word,\,nword)} $$
(1)

where pword and nword are positive seed word and negative seed word, Pset and Nset are positive seed words collection and negative seed words collection respectively. PMI(word 1 , word 2 ) is described in formula (2), P(word 1 &word 2 ), P(word 1 ) and P(word 2 ) are probabilities of word 1 and word 2 co-occurring, word 1 appearing, and word 2 appearing in a microblog respectively. When SO(word) is greater than zero, sentiment polarity of word is positive. Otherwise it is negative.

$$ PMI(word_{1} ,\,word_{2} )\, = \,\log (\frac{{P(word_{1} \& word_{2} )}}{{P(word_{1} )\,P(word_{2} )}}) $$
(2)

3.2 The Dictionary of Internet Slang

People usually use homophonic words, abbreviated words and network slang to express their opinions in social network, and [14] has analysed the sentiment of twitter data. Sometimes new words, produced by important events or news reports, are used to express their opinions. So we use the dictionary of internet slang appeared in [12] to support microblog topic polarity classification, containing homophonic words, abbreviated words, network slang and many new words. Table 3 shows part of the dictionary.

Table 3. Part of the dictionary of internet slang

3.3 The Dictionary of Emoticons

We construct the dictionary of emoticons by combining emotional symbol library in microblog with other statistical methods. The former is used to select obvious emotion symbols in microblog, such as Sina, Tencent microblog et al. The latter chooses emoticons used in other social network, containing user-generated emoticons.

Firstly, two laboratory personnel obtain emotional symbol library, and keep the emoticons with the same sentiment polarity after their analysis, and then get rid of emotional symbols with ambiguous polarity, the result is described in Table 4.

Table 4. Part of the dictionary of emoticons

Secondly, in order to enrich the dictionary of emoticons, especially user-generated emoticons in social network, two laboratory personnel collect and analyse sentiment polarity, and finally obtain the result shown in Table 5.

Table 5. Part of the dictionary of user-generated emoticons

In order to deal with the content conveniently, we pre-process all the microblogs and replace all the emoticons with their “Meaning” by looking up the dictionary of emoticons.

4 The Cascaded Model

4.1 MB-LDA Model for Microblog Topic Mining

MB-LDA is based on the research of LDA, and makes unified modeling for microblog’s contact relation and text relation. It is suitable for microblog topic mining. The parameters of the model are shown in Table 6.

Table 6. Parameter definition description

Bayesian network diagram of MB-LDA is shown in Fig. 2, \( c \) and \( r \) are used to represent the relation of the contact and the retweet respectively. At first, MB-LDA extracts the relation \( \varphi \) between the words and the topic which follows the Dirichlet distribution of the parameter \( \beta \). Usually conversation message in microblog begins with “@”, it is difficult to judge whether it is conversation message when “@” appears in other positions. In this paper we only consider contact relation in microblog beginning with “@”. When MB-LDA generates a microblog, we regard the microblog beginning with “@” as conversation message and set \( \pi_{c} \) = 1, and then extract the relation \( \theta_{c} \) between each topic and the contact \( c \) which follows the Dirichlet distribution of the parameter \( \alpha_{c} \), and assign \( \alpha_{c} \) to the relation \( \theta_{d} \) between the microblog \( d \) and each topic; Otherwise set \( \pi_{c} \) = 0, directly extract the relation \( \theta_{d} \) between each topic and the microblog \( d \) which follows the Dirichlet distribution of the parameter \( \alpha \).

Fig. 2.
figure 2

Bayesian network of MB-LDA

Throughout the microblog sets, the topic probability distribution \( \theta \) is defined as follows:

$$ P\left( {\theta |\alpha ,\,\alpha_{c} ,\,c} \right)\, = \,P(\theta_{c} |\alpha_{c} )^{{\pi_{c} }} P(\theta_{d} |\alpha )^{{1 - \pi_{c} }} $$
(3)

Secondly, how to identify retweet relation? If microblog contains “//”, we regard the relation between retweet microblog \( d_{RT} \) and each topic as \( \theta_{{d_{RT} }} \), and extract \( r \) from the Bernoulli distribution with parameter \( \lambda \), as well as extract the topic probability \( z_{dn} \) of the current word from the polynomial distribution with parameters \( \theta_{{d_{RT} }} \) or \( \theta_{d} \). However, we set \( r\, = \,0 \) when “//” doesn’t exist in microblog, and extract the topic probability \( z_{dn} \) of the current word from the polynomial distribution with parameter \( \theta_{d} \). Finally, the specific words are extracted from the polynomial distribution with the parameter \( \varphi_{{z_{dn} }} \). More the details about MB-LDA model, see [15].

In microblog, the joint probability distribution of all the words and their topics is shown as follows:

$$ P\left( {w,\,z |\lambda ,\,\theta ,\,\beta } \right)\, = \,P\left( {r |\lambda } \right)P\left( {z |\theta } \right)P\left( {w |z,\,\beta } \right)\, = \,P(r|\lambda )P\left( {z |\theta_{d} } \right)^{1 - r} P\left( {z |\theta_{{d_{RT} }} } \right)^{r} P(w|z,\,\beta ) $$
(4)

4.2 Hierarchical Attention Network

Traditional approaches of text polarity classification represent documents with sparse lexical features, such as n-grams, and then use a linear model or kernel methods on this representation. More recent approaches used deep learning, such as convolutional neural networks and recurrent neural networks based on long short-term memory (LSTM) to learn text representations. However, a better sentiment representation can be obtained in this paper by incorporating knowledge of microblog structure in the attention network. We know that not all parts of a microblog are equally relevant for judging the microblog polarity and that determining the relevant sections involves modeling the interactions of the words, not just their presence in isolation.

Words form sentences, sentences form a document. In the application of microblog’s polarity classification, we introduce hierarchical attention network created by Zichao Yang into our cascaded model. Our intention is to let the network to pay more or less attention to individual emotional factor when constructing microblog’s polarity classifier. The overall architecture is shown in Fig. 3. It consists of five parts: a word sequence encoder, a word-level attention layer, a sentence encoder, a sentence-level attention layer and softmax layer. The details of different parts have been described in [16], we don’t introduce them anymore.

Fig. 3.
figure 3

Hierarchical attention network

4.3 The Cascaded Model Architecture for Topic Polarity Classification

Although attention-network-based approaches to polarity classification have been quite effective, it is difficult to identify the topic and give the polarity towards that topic synchronously. We combine the MB-LDA model and attention network to generate the cascaded model. The overall architecture of the cascaded model is shown in Fig. 4. \( T_{{w_{i} }} \) expresses the probability of the word \( {\text{w}}_{\text{i}} \) belongs to the topic \( {\text{T}} \), where \( {\text{i}}\, \in \,\left[ {1,\,{\text{T}}} \right] \). The advantages of this architecture are as follows: (i) polarity classification is carried out on the basis of the results of topic recognition; (ii) the information input into the neural network takes into account the probability \( {\text{T}}_{{{\text{w}}_{\text{i}} }} \).

Fig. 4.
figure 4

The cascaded model architecture

The processing steps are as follows:

  1. (i)

    The MB-LDA model is used to obtain the topics of microblog data sets and the top 50 sentiment words in each topic. These sentiment words are selected from the topic according to the dictionary of sentiment words.

  2. (ii)

    Both the microblogs and the topic probabilities of each sentiment words from the same topic are used as the input of hierarchical attention network.

  3. (iii)

    The polarity classification of each microblog of each topic is achieved in the softmax layer.

5 Experiments and Results

In order to quantitatively analyze the performance of the cascade model, we use 4 different real microblog topic datasets to do experiments, and analyze the accuracy of polarity classification, the influence of topic number on accuracy, and the influence of emoticons on accuracy.

5.1 Data Sets

The labeled data sets in NLP&CC 2012Footnote 3 & 2013Footnote 4, a total of 405 microblogs, are provided by Tencent Weibo, including four topics: hui_rong_an, ipad, kang_ri_shen_ju_sample and ke_bi_sample. We reserve the microblog labeled with “opinionated = Y” and “forward” on behalf of “//” (retweet) in a microblog. When the number of “polarity = ‘POS’” in microblog is more than or equal to the number of “polarity = ‘NEG’”, we think that microblog is positive. Otherwise, it is negative, and according to the polarity tagging, we randomly add the corresponding emoticons to microblog to enrich the emotional characteristics of the data sets.

In order to avoid over-fitting or under-fitting, we adopt 10-fold cross-validation in the experiments. Namely data sets would be randomly divided into 10 parts, 9 parts of them are used as training sets and the others are used to test. We repeat the process for 10 times and finally take the average value.

In addition, in order to encode emoticons, such as “T_T”, and so on, we carry out the corresponding string processing “Good” and .

5.2 The Evaluation of Microblog Topic Polarity Classification

Polarity classification on microblog topic is evaluated by Precision, Recall and F-measure.

$$ \Pr ecision\, = \,\frac{\# system\_correct}{\# system\_proposed} $$
(5)
$$ \text{Re} call\, = \,\frac{\# system\_correct}{\# person\_correct} $$
(6)
$$ F\, - \,measure\, = \,\frac{{2\, \times \,\Pr ecision\, \times \,\text{Re} call}}{{\Pr ecision\, + \,\text{Re} call}} $$
(7)

Where #system_correct is the correct result from system, #system_proposed is the whole number of microblogs from system, #person_correct is the number of microblogs that has been annotated correctly by people, #weibo_topic is the number of microblogs containing topic words, #weibo_total is the whole number of microblogs in the collection.

5.3 Results

In order to evaluate microblog topic polarity recognition ability, considering the semi-supervised learning of the cascaded model, we compare it with the most representative unsupervised learning model JST [17], semi-supervised learning model SSA-ST [18] and supervised learning model SVM in four data sets for microblog topic polarity classification. The results of the experiment are shown in Table 7. The value in the table shows the average value of the correct rate of each group of data.

Table 7. The comparison of polarity classification in 4 data sets

From the above table, we can see that the precision of polarity classification in cascaded model is higher than that of unsupervised model JST and semi-supervised model SSA-ST, while our result is similar to that of supervised model SVM. The reason is that our cascaded model has strong ability to identify emotional characteristics, and we find that the attention network has higher weight in features’ calculation. This helps us quickly identify key elements that affect microblog’s topic polarity. Although the experimental results of the cascaded model are lower than SVM, the cascaded model can discover topics and achieve higher polarity classification with fewer training sets.

Because the cascaded model can synchronously detect the topic and its polarity in microblog data sets, it is necessary to explore the interaction between polarity classification and topic detection. We carry out an experimental analysis that how does the number of topics affect the precision of polarity classification. The results of the experiment are shown in Fig. 5.

Fig. 5.
figure 5

The influence of the number of topics on the precision of polarity classification

As shown in Fig. 5, the influence of different numbers of topics generated by the cascaded model is different on the same data sets. The inappropriate number of topics will reduce the precision of microblog’s polarity classification. Too little number of topics can reduce the correlation between the topic and its polarity. Too much number of topics can make the complete topic fragmented, which improves the noises of polarity classification and reduces the precision.

At the same time, we know that usually emoticons can effectively improve the effect of polarity classification, so what is the quantitative correlation between the two? We are gradually raising the number of microblogs containing emoticons in four data sets, that is to increase the proportion of microblogs with emoticons. The results of the experiment are shown in Fig. 6.

Fig. 6.
figure 6

The influence of the proportion of emoticons on the precision of polarity classification

Figure 6 shows that with the increase of the number of emoticons in microblog, the precision of polarity classification is also increasing. From the trend of the precision, different polarity classification models have different promotion when we increase the proportion of emoticons in data sets, the precision of all classification models and the proportion of emoticons is linear positive correlation, and based on the topic identified, the polarity classification performance of our cascade model is better obviously.

6 Conclusions and Future Work

With the popularity of microblog services, people can see and share reality events on microblog platform. Mining the topic sentiment hidden in massive microblog messages can effectively assist users in making decisions. [19, 20] have introduced a number of different sentiment analysis methods for twitter, but our approach is also suitable for twitter. In this paper, MB-LDA model and attention network are applied to Bi-RNN for topic-based microblog polarity classification, and the synchronized detection of the topic and its sentiment in microblog is achieved.