1 Introduction

Sentiment analysis has attracted increasing research interest in recent years. The objective is to classify the sentiment polarity of a text as positive, negative or neural. There has been a variety of approaches for this task. Representative approaches at present include machine learning algorithm and neural network models. [1] employ machine learning techniques to the sentiment analysis problem. Under this direction, most of studies [2] focus on designing effective features to obtain better classification performance. Feature engineering is important but labor-intensive. Neural network models [3,4,5,6] are popular for their capacity to learn text representation from data without careful engineering of features.

Sentiment analysis is a special case of text classification problem. For such tasks, neural network models take the text information as input and generate the semantic representations. Many models based on neural network have achieved excellent performance in sentiment analysis. However, these models only focus on the text content but ignore the crucial characteristics of emoticons. In recent years, people have become more and more interested in using emoticons when chatting or commenting online. It is a common sense that the emoticon makes significant influence on the ratings and emoticon is an additional factor to help extract sentiments. For instance, a text containing “:)” is most likely to have a positive emotion but containing “:(” is most likely to have a negative emotion. Even though, there are some work have focused on using emoticons as noisy labels to learn the classifiers from the data [7, 8] and exploiting emoticons in lexicon-based polarity classification [9]. However, they only consider the word-level preference rather than semantic levels.

Attention has become an effective mechanism to obtain superior results in a variety of NLP tasks such as machine translation [10], sentence summarization [11], and read comprehension [12]. In this paper, we propose an attention mechanism based on emoticon information to enhance the sentiment representation and improve the classification performance of our model. We explore the potential correlation of emoticons and sentiment polarity in sentence-level sentiment analysis. In order to capture information in response to given sentences with emoticons, we design an attention based Bi-LSTM. We evaluate our approach on DouBan movie review dataset, which contains short movie reviews data and each of the movie review contains one or more emoticons.

To Summarize, our effort provide the following contributions:

  1. (1)

    Most existing algorithms for sentiment analysis only focus on text information and don’t full make use of the emoticon information. We propose a neural network model with emoticon information for sentiment analysis, and we consider the information conveyed by emoticons is assumed to affect the surrounding text on sentence level.

  2. (2)

    We explore the attention mechanism based on emoticon information for sentiment analysis. Traditional attention-based neural network models only take the local text information into consideration. In contrast, our model puts forward the idea of emoticon attention by utilizing the emoticon information.

  3. (3)

    We build corpus with rich emoticons from DouBan and we use it as our experiment dataset to verify the effectiveness of our model. The experimental results demonstrate that our model are able to better use emoticon information to improve the performance on sentiment analysis.

2 Related Work

2.1 Sentiment Analysis

Sentiment analysis is a long standing research topic. Readers can refer to [13] for a recent survey. In this section, we describe some related work about sentiment analysis.

The problem of sentiment analysis has been of great interest in the past decades because of its practical applicability. For example, a sentiment model could be employed to rank products and merchants [14]. In [15], Twitter sentiment was applied to predict election results. In [16], a method was reported for predicting comment volumes of political blogs. In [17], movie reviews and blogs were used to predict box-office revenues for movies. In [18], sentiment flow in social networks was investigated. In [19], expert investors in microblogs were identified and sentiment analysis of stocks was performed. In [20], sentiment analysis was used to characterize social relations. [21] used deep learning to predict movie reviews’ sentiment polarity.

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored. Statistical methods leverage on elements from machine learning such as latent semantic analysis, support vector machines, “bag of words” and Semantic Orientation—Pointwise Mutual Information [22]. Hybrid approaches leverage on both machine learning and elements from knowledge representation such as subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so. With the trends of deep learning in computer vision, speech recognition and natural language processing, neural models are introduced into sentiment analysis field due to its ability of text representation learning.

2.2 Neural Network Models for Sentiment Analysis

Neural network models have achieved promising results for sentiment analysis due to its ability of text representation learning. There’re three main neural network models for sentiment analysis, recursive neural network, convolution neural network and recurrent neural network. Socher conducts a series of recursive neural network models to learn representations based on the recursive tree structure of sentences, including Recursive Autoencoder (RAE) [23], Matrix-Vector Recursive Neural Network (MV-RNN) [4] and Recursive Neural Tensor Network (RNTN) [24]. [6] (Kim et al. 2014) and [25] adopt convolution neural network (CNN) to learn sentence representations and achieve outstanding performance in sentiment analysis. [26] investigate tree-structured long-short term memory (LSTM) networks on sentiment analysis.

2.3 Emoticons

People’s facial expression like laughing and weeping are often considered to be involuntary ways of expressing oneself in face-to-face communication, whereas the use of their respective equivalents emoticons like “:-)” and “:-(” in computer mediated communication is intentional [27].

Even though some works has taken into account the information conveyed by emoticons, most of existing sentiment analysis models just consider emoticons as noisy labels. For instance, [7, 8, 28] uses the emoticons like and as noisy labels to construct training data for polarity classification. The basic assumption is that a text contains is most likely to have a positive emotion and that containing to be negative. Provided that emotions are important cues for sentiment in text, the key to harvesting information from emoticons lies in understanding how they related to a text’s overall polarity. To address this issue, [9] exploit emoticons in lexicon-based polarity classification. Nevertheless, it only considers the word-level preference rather than semantic levels and ignores the interplay of emoticons and textual cues for sentiment, for instance, in cases when emoticons are used to intensify sentiment that is already conveyed by the text. In contrast, we propose an efficient neural sentiment analysis model with emoticons which serve as attention to take the interplay of emoticons and textual into consideration.

3 Method

We describe the proposed sentiment analysis model with emoticon attention in this section. Figure 1 gives the overall architecture of our model. First, we use Bidirectional Long Short-Term Memory (Bi-LSTM) network to learn the representation of input sentences, due to its ability in capturing both past and future information. Furthermore, all emoticons in sentence are extracted to enhance sentence semantic representations. Finally, the enhanced sentence representation is used as input of sentiment analysis model.

Fig. 1.
figure 1

Neural sentiment analysis model with emoticon attention.

3.1 Bi-directional Long Short-Term Memory Network

In this subsection, we describe the Bidirectional Long Short-Term Memory (Bi-LSTM) network for sentiment analysis. Recurrent neural network (RNNs) is a very useful model in dealing with language data. RNNs, particularly ones with gated architectures such as the LSTM, are very powerful at capturing statistical regularities in sequential inputs. To learn the semantic representation of a sentence, we adopt Bi-LSTM network as our sentiment analysis model.

Given an input sentence, we represent this sentence as \( S = \left\{ {w_{1} ,w_{2} , \ldots w_{n} } \right\} \). In which \( w_{j} \) is the \( j \)-th word in sentence \( S \) and \( n \) is the length of sentence. In the embedding layer, we embed each word in a sentence into a low dimensional semantic space. That means each word \( w_{j} \) is mapped to its embedding \( x_{j} \in R^{d} \). Then we obtain a \( \left\{ {x_{1} ,x_{2} \ldots x_{n} } \right\} \) represent the word vector in a sentence. At time step \( j \), we use the word vector \( x_{j} \) as LSTM cell’s input. More formally, given an input word vector \( x_{j} \), the current cell state \( c_{j} \) and hidden state \( h_{j} \) can be update with previous cell state \( c_{j - 1} \) and hidden state \( h_{j - 1} \):

$$ \left[ {\begin{array}{*{20}c} {i_{j} } \\ {f_{j} } \\ {o_{j} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} \sigma \\ \sigma \\ \sigma \\ \end{array} } \right]\left( {W \cdot \left[ {h_{j - 1} ,x_{j} } \right] + b} \right) $$
(1)
$$ \hat{c}_{j} = tanh\left( {W \cdot \left[ {h_{j - 1} ,x_{j} } \right] + b} \right) $$
(2)
$$ c_{j} = f_{j} \otimes c_{j - 1} + i_{j} \otimes \hat{c}_{j} $$
(3)
$$ h_{j} = o_{j} \otimes tanh\left( {c_{j} } \right) $$
(4)

We use the bidirectional LSTM structure in the LSTM layer. Given a input sequence \( {\text{S}}_{{1:{\text{n}}}} \). The Bi-LSTM works by maintaining two separate states, the forward state \( \vec{h}_{j} \) and the backward state \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{j} \) for each input position \( j \). The forward and backward states are generated by two different LSTM cell. The first LSTM is fed with the original input sequence, while the second LSTM is fed with the input sequence in reverse. As a result, we can obtain the final \( h_{j} \) as follow:

$$ \text{h}_{\rm j} = \left[ {\overrightarrow {{{\rm h}_{\rm j} }} ,\overleftarrow {{{\rm h}_{\rm j} }} } \right] $$
(5)

3.2 Emoticon Attention

We bring in emoticon attention to get the representation of semantic levels and exploit the interplay of emoticons and textual cues for sentiment. The emoticons has different effects on different words in the sentence. Hence, instead of feeding hidden states to an average pooling layer, we adopt an emoticon attention mechanism to extract emoticon specific words that are important to the meaning of sentence. Finally, we aggregate the representations of those informative words to form the sentence representation. Formally, the enhanced sentence representation is a weighted sum of hidden states as:

$$ s = \sum\nolimits_{j = 1}^{n} {\alpha_{j} h_{j} } $$
(6)

Where \( \alpha_{j} \) measures the importance of the \( j \)-th word for current emoticons. For each input sentence, there are \( k \) emoticons \( \left\{ {e_{1} ,e_{2} , \ldots ,e_{k} } \right\} \). Here, we embed each emoticon as a continuous and real-valued vector \( e_{j} \in R^{{d_{e} }} \), where \( d_{e} \) is the dimension of emoticon embeddings respectively. Then the vector of all the related words are made an average operation. After that, we get a emoticon representation \( e \in R^{{d_{e} }} \). Thus, the attention weight \( \alpha_{j} \) for each hidden state can be defined as:

$$ \upalpha_{\text{j}} = \frac{{{ \exp }\left( {{\text{att}}\left( {{\text{h}}_{\text{j}} ,{\text{e}}} \right)} \right)}}{{\mathop \sum \nolimits_{{{\text{k}} = 1}}^{\text{n}} { \exp }\left( {{\text{att}}\left( {{\text{h}}_{\text{k}} ,{\text{e}}} \right)} \right)}} $$
(7)

Where \( att \) is an attention function which scores the importance of words for composing sentence representation. The attention function \( att \) is defined as:

$$ {\text{att}}\left( {{{\rm h}}_{\text{j}} ,{{\rm e}}} \right) = {\text{v}}^{{\rm T}} { \tanh }\left( {{\text{W}}_{{\rm H}} {\text{h}}_{{\rm j}} + {\text{W}}_{{\rm E}} {\text{e}} + {{\rm b}}} \right) $$
(8)

Where \( W_{H} \) and \( W_{E} \) are weight matrices, \( v \) is vector and \( v^{T} \) denotes its transpose.

3.3 Sentiment Classification

Since sentence representation \( s \) is hierarchically extracted from the words in the sentences, it is a high level representation of the sentence. Hence, we regard it as a features for sentence sentiment classification. We use a non-linear layer to project sentence representation \( s \) into the target space of \( C \) classes:

$$ {\hat{\text{s}}} = { \tanh }\left( {{\text{W}}_{\text{c}} {\text{s}} + {\text{b}}_{\text{c}} } \right) $$
(9)

Afterwards, we use a softmax layer to obtain the sentence sentiment distribution:

$$ {\text{p}}_{\text{c}} = \frac{{{ \exp }\left( {{\hat{\text{s}}}_{\text{c}} } \right)}}{{\mathop \sum \nolimits_{{{\text{k}} = 1}}^{\text{C}} { \exp }\left( {{\hat{\text{s}}}_{\text{k}} } \right)}} $$
(10)

where \( C \) is the number of sentiment classes, \( p_{c} \) is the predicted probability of sentiment class \( c \). In our model, cross-entropy error between gold sentiment distribution and predicted sentiment distribution is defined as loss function for optimization while training

$$ {\text{L}} = - \sum\nolimits_{{{\text{s}} \in {\text{S}}}} {\sum\nolimits_{{{\text{c}} = 1}}^{\text{C}} {{\text{p}}_{\text{c}}^{\text{g}} \left( {\text{s}} \right) \cdot \log \left( {{\text{p}}_{\text{c}} \left( {\text{s}} \right)} \right)} } $$
(11)

4 Experiments

In this section, we describe the experiment settings and give empirical results and analysis.

4.1 Data

In order to study the role of emoticon in the sentiment analysis of text, we constructed a dataset of DouBan movie reviews (DBMR), which is the first Chinese sentiment corpus with rich emoticons.

DBMR is collected from DouBan site, which is the largest movie review website in China. We crawled about 9 million raw data from douban.com. Besides, we have built a common emoticon set that contains about 100 commonly used emoticons. We cleaned the data and kept all the emoticons in the text according to a series of strict rules such as removing reviews with less than 3 words and so on. Then we extracted 47250 samples containing emoticons from the original data as our experimental data sets (DBMR).

In our emoticon set, each emoticon contains different emotional meanings such as laughing emoticon ‘:)’ and weeping ‘: (’. Some of these emoticons contain more complex meanings, such as ‘= =’ which means awkward, indifferent or satiric. Each of the movie reviews contains one or more emoticons. There are some samples shown in Table 1. In order for non-Chinese readers to better understand the examples, we translate the reviews into English besides. In Table 1, proportion gives the distribution of different emoticons.

Table 1. DBMR samples.

The sentiment label of DBMR dataset ranges from 1 to 5, which is scored by the reviewer. 1 means very negative, 2 means negative, 3 means neural, 4 means positive and 5 means very positive. Our task aims to identify the sentiment score of a movie review.

We split train, validation and test sets in the proportion of 8:1:1(80% for training, 10% for validation, and 10% for test). Validation set is used to find the optimal parameters for model. The statistical information of the dataset is shown in Table 2.

Table 2. DBMR Samples

4.2 Metrics

We employ standards accuracy rate [29] to measure the overall sentiment analysis performance. The higher accuracy is, the better performance is. The accuracy rate is defined as following equation:

$$ {\text{Accuracy}} = \frac{\text{T}}{{\rm N}} $$
(12)

\( {\text{T}} \) is the number of predicted sentiment label equal to ground truth label, \( {\text{N}} \) is the overall number of text.

4.3 Baseline

In order to fully assess the efficiency of our model, we compare with following methods, which are widely used as baselines in other sentiment analysis work.

Majority Method:

It is a basic baseline method, which assigns the majority sentiment label in train set to each instance in the test set. This method has been widely used as a baseline in sentiment analysis task [30].

SVM:

We implement the standard SVM method and adopt tf-idf as word representation as baseline methods in our task. This method has been widely used as a baseline in sentiment analysis task [31].

NB:

We implement the standard Naive Bayes method and adopt tf-idf as word representation as baseline methods in our task.

LSTM/Bi-LSTM:

Long Short-Term Memory [32] and the bidirectional variant as introduced previously, the Bidirectional LSTM can capture both past and future information.

CNN:

Convolutional Neural Network [33] generates sentence representation by convolution and pooling operations.

4.4 Model Comparison and Analysis

As shown the experimental results in Table 3, we divided the results into two parts, the left one of which only consider the text information and the right one consider the text with emoticons.

Table 3. Sentiment analysis result

Majority is just a simple statistics method without any semantic analysis. So it’s no wonder that it performs worst. Morever, from the results, we observe that neural network models, the basic implementation of our model, significantly outperforms traditional machine learning algorithms. It indicates the efficiency of deep neural networks.

Besides, in the right part of Table 3, we show the performance of models with emoticon information. From this part, we can see that the emoticon information is helpful for sentiment analysis. For example, with the consideration of such information in DouBan, LSTM achieves 2.1% improvement, Bi-LSTM achieves 0.9% improvement and CNN achieves 1.2% improvement.

In our experiment, most of the baseline models achieved 42% accuracy, but we consider that relying on base network structures such as CNN or RNN is difficult to make full use of emoticons in a sentence, and it’s hard to fully extract the emotional information in the sentence. We consider the importance of emoticons and adopt an attention mechanism to extract emoticon specific words that are important to the emotional attributes of sentence. Our proposed Bi-LSTM model with emoticon attention (Bi-LSTM + EA) outperforms all the other baseline methods. It indicates our model incorporates emoticon information in an effective and efficient way.

5 Conclusion and Future Work

In this paper, we propose a neural network which incorporates emoticon information via semantic level attentions. With the emoticon attention, our model can take account of the emoticons in semantic level. In experiments, we evaluate our model on sentiment analysis task. The experimental results show that our model achieves significant and consistent improvements compared to other popular models. In this paper, we only consider the emoticon information. In fact, most movies usually have some background information such as director and actor. We will take ad-vantages of that information in sentiment analysis in future.