Abstract
Sentiment analysis is one of the most important research directions in natural language processing field. People increasingly use emoticons in text to express their sentiment. However, most existing algorithms for sentiment classification only focus on text information but don’t full make use of the emoticon information. To address this issue, we propose a novel LSTM architecture with emoticon attention to incorporate emoticon information into sentiment analysis. Emoticon attention is employed to use emoticons to capture crucial semantic components. To evaluate the efficiency of our model, we build the first sentiment corpus with rich emoticons from movie review website and we use it as our experiment dataset. Experiments results show that our approach is able to better use emoticon information to improve the performance on sentiment analysis.
1 Introduction
Sentiment analysis has attracted increasing research interest in recent years. The objective is to classify the sentiment polarity of a text as positive, negative or neural. There has been a variety of approaches for this task. Representative approaches at present include machine learning algorithm and neural network models. [1] employ machine learning techniques to the sentiment analysis problem. Under this direction, most of studies [2] focus on designing effective features to obtain better classification performance. Feature engineering is important but labor-intensive. Neural network models [3,4,5,6] are popular for their capacity to learn text representation from data without careful engineering of features.
Sentiment analysis is a special case of text classification problem. For such tasks, neural network models take the text information as input and generate the semantic representations. Many models based on neural network have achieved excellent performance in sentiment analysis. However, these models only focus on the text content but ignore the crucial characteristics of emoticons. In recent years, people have become more and more interested in using emoticons when chatting or commenting online. It is a common sense that the emoticon makes significant influence on the ratings and emoticon is an additional factor to help extract sentiments. For instance, a text containing “:)” is most likely to have a positive emotion but containing “:(” is most likely to have a negative emotion. Even though, there are some work have focused on using emoticons as noisy labels to learn the classifiers from the data [7, 8] and exploiting emoticons in lexicon-based polarity classification [9]. However, they only consider the word-level preference rather than semantic levels.
Attention has become an effective mechanism to obtain superior results in a variety of NLP tasks such as machine translation [10], sentence summarization [11], and read comprehension [12]. In this paper, we propose an attention mechanism based on emoticon information to enhance the sentiment representation and improve the classification performance of our model. We explore the potential correlation of emoticons and sentiment polarity in sentence-level sentiment analysis. In order to capture information in response to given sentences with emoticons, we design an attention based Bi-LSTM. We evaluate our approach on DouBan movie review dataset, which contains short movie reviews data and each of the movie review contains one or more emoticons.
To Summarize, our effort provide the following contributions:
-
(1)
Most existing algorithms for sentiment analysis only focus on text information and don’t full make use of the emoticon information. We propose a neural network model with emoticon information for sentiment analysis, and we consider the information conveyed by emoticons is assumed to affect the surrounding text on sentence level.
-
(2)
We explore the attention mechanism based on emoticon information for sentiment analysis. Traditional attention-based neural network models only take the local text information into consideration. In contrast, our model puts forward the idea of emoticon attention by utilizing the emoticon information.
-
(3)
We build corpus with rich emoticons from DouBan and we use it as our experiment dataset to verify the effectiveness of our model. The experimental results demonstrate that our model are able to better use emoticon information to improve the performance on sentiment analysis.
2 Related Work
2.1 Sentiment Analysis
Sentiment analysis is a long standing research topic. Readers can refer to [13] for a recent survey. In this section, we describe some related work about sentiment analysis.
The problem of sentiment analysis has been of great interest in the past decades because of its practical applicability. For example, a sentiment model could be employed to rank products and merchants [14]. In [15], Twitter sentiment was applied to predict election results. In [16], a method was reported for predicting comment volumes of political blogs. In [17], movie reviews and blogs were used to predict box-office revenues for movies. In [18], sentiment flow in social networks was investigated. In [19], expert investors in microblogs were identified and sentiment analysis of stocks was performed. In [20], sentiment analysis was used to characterize social relations. [21] used deep learning to predict movie reviews’ sentiment polarity.
Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored. Statistical methods leverage on elements from machine learning such as latent semantic analysis, support vector machines, “bag of words” and Semantic Orientation—Pointwise Mutual Information [22]. Hybrid approaches leverage on both machine learning and elements from knowledge representation such as subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so. With the trends of deep learning in computer vision, speech recognition and natural language processing, neural models are introduced into sentiment analysis field due to its ability of text representation learning.
2.2 Neural Network Models for Sentiment Analysis
Neural network models have achieved promising results for sentiment analysis due to its ability of text representation learning. There’re three main neural network models for sentiment analysis, recursive neural network, convolution neural network and recurrent neural network. Socher conducts a series of recursive neural network models to learn representations based on the recursive tree structure of sentences, including Recursive Autoencoder (RAE) [23], Matrix-Vector Recursive Neural Network (MV-RNN) [4] and Recursive Neural Tensor Network (RNTN) [24]. [6] (Kim et al. 2014) and [25] adopt convolution neural network (CNN) to learn sentence representations and achieve outstanding performance in sentiment analysis. [26] investigate tree-structured long-short term memory (LSTM) networks on sentiment analysis.
2.3 Emoticons
People’s facial expression like laughing and weeping are often considered to be involuntary ways of expressing oneself in face-to-face communication, whereas the use of their respective equivalents emoticons like “:-)” and “:-(” in computer mediated communication is intentional [27].
Even though some works has taken into account the information conveyed by emoticons, most of existing sentiment analysis models just consider emoticons as noisy labels. For instance, [7, 8, 28] uses the emoticons like and as noisy labels to construct training data for polarity classification. The basic assumption is that a text contains is most likely to have a positive emotion and that containing to be negative. Provided that emotions are important cues for sentiment in text, the key to harvesting information from emoticons lies in understanding how they related to a text’s overall polarity. To address this issue, [9] exploit emoticons in lexicon-based polarity classification. Nevertheless, it only considers the word-level preference rather than semantic levels and ignores the interplay of emoticons and textual cues for sentiment, for instance, in cases when emoticons are used to intensify sentiment that is already conveyed by the text. In contrast, we propose an efficient neural sentiment analysis model with emoticons which serve as attention to take the interplay of emoticons and textual into consideration.
3 Method
We describe the proposed sentiment analysis model with emoticon attention in this section. Figure 1 gives the overall architecture of our model. First, we use Bidirectional Long Short-Term Memory (Bi-LSTM) network to learn the representation of input sentences, due to its ability in capturing both past and future information. Furthermore, all emoticons in sentence are extracted to enhance sentence semantic representations. Finally, the enhanced sentence representation is used as input of sentiment analysis model.
3.1 Bi-directional Long Short-Term Memory Network
In this subsection, we describe the Bidirectional Long Short-Term Memory (Bi-LSTM) network for sentiment analysis. Recurrent neural network (RNNs) is a very useful model in dealing with language data. RNNs, particularly ones with gated architectures such as the LSTM, are very powerful at capturing statistical regularities in sequential inputs. To learn the semantic representation of a sentence, we adopt Bi-LSTM network as our sentiment analysis model.
Given an input sentence, we represent this sentence as \( S = \left\{ {w_{1} ,w_{2} , \ldots w_{n} } \right\} \). In which \( w_{j} \) is the \( j \)-th word in sentence \( S \) and \( n \) is the length of sentence. In the embedding layer, we embed each word in a sentence into a low dimensional semantic space. That means each word \( w_{j} \) is mapped to its embedding \( x_{j} \in R^{d} \). Then we obtain a \( \left\{ {x_{1} ,x_{2} \ldots x_{n} } \right\} \) represent the word vector in a sentence. At time step \( j \), we use the word vector \( x_{j} \) as LSTM cell’s input. More formally, given an input word vector \( x_{j} \), the current cell state \( c_{j} \) and hidden state \( h_{j} \) can be update with previous cell state \( c_{j - 1} \) and hidden state \( h_{j - 1} \):
We use the bidirectional LSTM structure in the LSTM layer. Given a input sequence \( {\text{S}}_{{1:{\text{n}}}} \). The Bi-LSTM works by maintaining two separate states, the forward state \( \vec{h}_{j} \) and the backward state \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{j} \) for each input position \( j \). The forward and backward states are generated by two different LSTM cell. The first LSTM is fed with the original input sequence, while the second LSTM is fed with the input sequence in reverse. As a result, we can obtain the final \( h_{j} \) as follow:
3.2 Emoticon Attention
We bring in emoticon attention to get the representation of semantic levels and exploit the interplay of emoticons and textual cues for sentiment. The emoticons has different effects on different words in the sentence. Hence, instead of feeding hidden states to an average pooling layer, we adopt an emoticon attention mechanism to extract emoticon specific words that are important to the meaning of sentence. Finally, we aggregate the representations of those informative words to form the sentence representation. Formally, the enhanced sentence representation is a weighted sum of hidden states as:
Where \( \alpha_{j} \) measures the importance of the \( j \)-th word for current emoticons. For each input sentence, there are \( k \) emoticons \( \left\{ {e_{1} ,e_{2} , \ldots ,e_{k} } \right\} \). Here, we embed each emoticon as a continuous and real-valued vector \( e_{j} \in R^{{d_{e} }} \), where \( d_{e} \) is the dimension of emoticon embeddings respectively. Then the vector of all the related words are made an average operation. After that, we get a emoticon representation \( e \in R^{{d_{e} }} \). Thus, the attention weight \( \alpha_{j} \) for each hidden state can be defined as:
Where \( att \) is an attention function which scores the importance of words for composing sentence representation. The attention function \( att \) is defined as:
Where \( W_{H} \) and \( W_{E} \) are weight matrices, \( v \) is vector and \( v^{T} \) denotes its transpose.
3.3 Sentiment Classification
Since sentence representation \( s \) is hierarchically extracted from the words in the sentences, it is a high level representation of the sentence. Hence, we regard it as a features for sentence sentiment classification. We use a non-linear layer to project sentence representation \( s \) into the target space of \( C \) classes:
Afterwards, we use a softmax layer to obtain the sentence sentiment distribution:
where \( C \) is the number of sentiment classes, \( p_{c} \) is the predicted probability of sentiment class \( c \). In our model, cross-entropy error between gold sentiment distribution and predicted sentiment distribution is defined as loss function for optimization while training
4 Experiments
In this section, we describe the experiment settings and give empirical results and analysis.
4.1 Data
In order to study the role of emoticon in the sentiment analysis of text, we constructed a dataset of DouBan movie reviews (DBMR), which is the first Chinese sentiment corpus with rich emoticons.
DBMR is collected from DouBan site, which is the largest movie review website in China. We crawled about 9 million raw data from douban.com. Besides, we have built a common emoticon set that contains about 100 commonly used emoticons. We cleaned the data and kept all the emoticons in the text according to a series of strict rules such as removing reviews with less than 3 words and so on. Then we extracted 47250 samples containing emoticons from the original data as our experimental data sets (DBMR).
In our emoticon set, each emoticon contains different emotional meanings such as laughing emoticon ‘:)’ and weeping ‘: (’. Some of these emoticons contain more complex meanings, such as ‘= =’ which means awkward, indifferent or satiric. Each of the movie reviews contains one or more emoticons. There are some samples shown in Table 1. In order for non-Chinese readers to better understand the examples, we translate the reviews into English besides. In Table 1, proportion gives the distribution of different emoticons.
The sentiment label of DBMR dataset ranges from 1 to 5, which is scored by the reviewer. 1 means very negative, 2 means negative, 3 means neural, 4 means positive and 5 means very positive. Our task aims to identify the sentiment score of a movie review.
We split train, validation and test sets in the proportion of 8:1:1(80% for training, 10% for validation, and 10% for test). Validation set is used to find the optimal parameters for model. The statistical information of the dataset is shown in Table 2.
4.2 Metrics
We employ standards accuracy rate [29] to measure the overall sentiment analysis performance. The higher accuracy is, the better performance is. The accuracy rate is defined as following equation:
\( {\text{T}} \) is the number of predicted sentiment label equal to ground truth label, \( {\text{N}} \) is the overall number of text.
4.3 Baseline
In order to fully assess the efficiency of our model, we compare with following methods, which are widely used as baselines in other sentiment analysis work.
Majority Method:
It is a basic baseline method, which assigns the majority sentiment label in train set to each instance in the test set. This method has been widely used as a baseline in sentiment analysis task [30].
SVM:
We implement the standard SVM method and adopt tf-idf as word representation as baseline methods in our task. This method has been widely used as a baseline in sentiment analysis task [31].
NB:
We implement the standard Naive Bayes method and adopt tf-idf as word representation as baseline methods in our task.
LSTM/Bi-LSTM:
Long Short-Term Memory [32] and the bidirectional variant as introduced previously, the Bidirectional LSTM can capture both past and future information.
CNN:
Convolutional Neural Network [33] generates sentence representation by convolution and pooling operations.
4.4 Model Comparison and Analysis
As shown the experimental results in Table 3, we divided the results into two parts, the left one of which only consider the text information and the right one consider the text with emoticons.
Majority is just a simple statistics method without any semantic analysis. So it’s no wonder that it performs worst. Morever, from the results, we observe that neural network models, the basic implementation of our model, significantly outperforms traditional machine learning algorithms. It indicates the efficiency of deep neural networks.
Besides, in the right part of Table 3, we show the performance of models with emoticon information. From this part, we can see that the emoticon information is helpful for sentiment analysis. For example, with the consideration of such information in DouBan, LSTM achieves 2.1% improvement, Bi-LSTM achieves 0.9% improvement and CNN achieves 1.2% improvement.
In our experiment, most of the baseline models achieved 42% accuracy, but we consider that relying on base network structures such as CNN or RNN is difficult to make full use of emoticons in a sentence, and it’s hard to fully extract the emotional information in the sentence. We consider the importance of emoticons and adopt an attention mechanism to extract emoticon specific words that are important to the emotional attributes of sentence. Our proposed Bi-LSTM model with emoticon attention (Bi-LSTM + EA) outperforms all the other baseline methods. It indicates our model incorporates emoticon information in an effective and efficient way.
5 Conclusion and Future Work
In this paper, we propose a neural network which incorporates emoticon information via semantic level attentions. With the emoticon attention, our model can take account of the emoticons in semantic level. In experiments, we evaluate our model on sentiment analysis task. The experimental results show that our model achieves significant and consistent improvements compared to other popular models. In this paper, we only consider the emoticon information. In fact, most movies usually have some background information such as director and actor. We will take ad-vantages of that information in sentiment analysis in future.
References
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242 (2013)
Socher, R., Bauer, J., Manning, C.D.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 455–465 (2013)
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Empirical Methods in Natural Language Processing, pp. 1201–1211 (2012)
Socher, R., Lin, C.C., Manning, C.D., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 129–136 (2011)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224 N Project Report, Stanford, vol. 1 (2009)
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc (2010)
Hogenboom, A., Bal, D., Frasincar, F., Bal, M., de Jong, F., Kaymak, U.: Exploiting emoticons in sentiment analysis. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 703–710 (2013)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015)
Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press, Cambridge (2015)
McGlohon, M., Glance, N.S., Reiter, Z.: Star quality: aggregating reviews to rank products and merchants. In: ICWSM (2010)
Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with twitter: what 140 characters reveal about political sentiment. In: ICWSM, vol. 10, pp. 178–185 (2010)
Yano, T., Smith, N.A.: What’s worthy of comment? Content and comment volume in political blogs. In: ICWSM (2010)
Sadikov, E., Parameswaran, A.G., Venetis, P.: Blogs as predictors of movie success. In: ICWSM (2009)
Miller, M., Sathi, C., Wiesenthal, D., Leskovec, J., Potts, C.: Sentiment flow through hyperlink networks. In: ICWSM (2011)
Feldman, R., Rosenfeld, B., Bar-Haim, R., Fresko, M.: The stock sonar—sentiment analysis of stocks based on a hybrid approach. In: Twenty-Third IAAI Conference (2011)
Groh, G., Hauffa, J.: Characterizing social relations via NLP-based sentiment analysis. In: ICWSM (2011)
Li, C., Xu, B., Wu, G., He, S., Tian, G., Zhou, Y.: Parallel recursive deep model for sentiment analysis. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 15–26. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_2
Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)
Socher, R., Pennington, J., Huang, E., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Empirical Methods in Natural Language Processing, pp. 151–161 (2011)
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Santos, C.N.D., Gatti, M.A.D.C.: Deep convolutional neural networks for sentiment analysis of short texts. In: International Conference on Computational Linguistics, pp. 69–78 (2014)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL (2015)
Kendon, A.: On gesture: its complementary relationship with speech. In: Nonverbal Behavior and Communication, pp. 65–97 (1987)
Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 675–682 (2009)
Jurafsky, D.: Speech & Language Processing. Pearson Education India, Noida (2000)
Tang, D., Qin, B., Feng, X., Liu, T.: Target-dependent sentiment classification with long short term memory. CoRR, abs/1512.01100 (2015)
Chen, H., Sun, M., Tu, C., Lin, Y., Liu, Z.: Neural sentiment classification with user and product attention
Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Li, C., Liu, P. (2019). Sentiment Analysis Based on LSTM Architecture with Emoticon Attention. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-26142-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)