Elsevier

Knowledge-Based Systems

Volume 152, 15 July 2018, Pages 70-82
Knowledge-Based Systems

Sentiment classification with word localization based on weakly supervised learning with a convolutional neural network

https://doi.org/10.1016/j.knosys.2018.04.006Get rights and content

Abstract

In order to maximize the applicability of sentiment analysis results, it is necessary to not only classify the overall sentiment (positive/negative) of a given document but also to identify the main words that contribute to the classification. However, most datasets for sentiment analysis only have the sentiment label for each document or sentence. In other words, there is a lack of information about which words play an important role in sentiment classification. In this paper, we propose a method for identifying key words discriminating positive and negative sentences by using a weakly supervised learning method based on a convolutional neural network (CNN). In our model, each word is represented as a continuous-valued vector and each sentence is represented as a matrix whose rows correspond to the word vector used in the sentence. Then, the CNN model is trained using these sentence matrices as inputs and the sentiment labels as the output. Once the CNN model is trained, we implement the word attention mechanism that identifies high-contributing words to classification results with a class activation map, using the weights from the fully connected layer at the end of the learned CNN model. To verify the proposed methodology, we evaluated the classification accuracy and the rate of polarity words among high scoring words using two movie review datasets. Experimental results show that the proposed model can not only correctly classify the sentence polarity but also successfully identify the corresponding words with high polarity scores.

Introduction

Sentiment analysis and opinion mining is a field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing (NLP) and has also been widely studied in data mining, Web mining, and text mining [1], [2], [3], [4]. Application domains for sentiment analysis include analyses of customer response to new products or services, analyses of public opinion towards the government's new policies or political issues under debate, etc. [5]. In response to increasing needs in diverse domains, various sentiment analysis techniques have been developed [6], [7], [8], [9].

Beside the research purpose, as one of the basic building blocks of artificial intelligence (AI) is to understand a human speaker's intention, global technology leaders have released their own AI speakers, such as Amazon's “Eco,” Google's “Google Home,” and Apple's “Homepod,” to collect real-word conversational data in order to upgrade their AI engines. As these AI speakers process the human speaker's query at a sentence level, it becomes more critical to correctly identify the main intentions (words/phrases) of the speaker, which is the ultimate goal of sentiment analysis models.

However, many of the current sentiment analysis techniques suffer from the over-abstraction problem [6]; the only information obtained from these techniques is the polarity of the document, i.e., whether the nuance of the document is positive or negative. It is difficult to receive more in-depth sentiment analysis results, such as identifying the main words contributing to the polarity classification or finding opposite words or phrase to the overall sentiment of the document, i.e., negative words/phrases in a positive document or positive words/phrases in a negative document. In other words, there is a restriction that the model should learn polarity scores for words or phrases without actual labels.

To resolve the over-abstraction problem, recurrent neural network (RNN)-based attention models have been developed. These usually introduce an additional attention mechanism above the hidden state of the standard RNN structure, so that the extent to which each word contributes to the overall polarity can be discovered [10]. Hence, the attention mechanism, which computes the similarities between all hidden states of input sequences and the current test state, helps the NLP model to focus on salient words/phrases, and transfer these attentions to other machine learning models to solve more complicated tasks such as document classification [11], parsing [12], machine translation [13], [14], and image captioning [15]. Recently, various methodologies such as attentive LSTM and tensor fusion networks have been proposed in the NLP field [16], [17], [18].

Although attention-based RNN models can identify significant words, learning the attention weights presents a further computational burden. In the field of computer vision, it is possible to localize significant regions for a predicted class without learning additional parameters by using a convolutional neural network (CNN) structure. Although CNN was originally developed for image processing tasks [19], [20], [21], [22], [23], [24], it has been successfully adopted to solve various NLP tasks such as document classification and sentiment analysis [25], [26], [27], [28], [29], [30]. Oquab, Bottou, Laptev and Sivic [31] proposed the class activation map (CAM) technique, which uses only the class information of an object in an image to localize objects without a bounding box, which has been considered mandatory for object localization. Because CAM locates significant regions indirectly using the weights between the final hidden layer and the output layer of CNN, the learning of additional structures or parameters is not required.

In this paper, we propose a sentiment classification with a word localization model based on weakly supervised leaning with a convolutional neural network (CNN), named CAM2: Classification and locAlization Model with a Class Activation Map. The proposed model performs localization on the input data (document) using CNN and CAM. Similar to attention-based RNN models, the proposed model can identify semantically significant words in the document and provide this information to the predicted class. The main advantage of the proposed model is its ability to efficiently identify crucial words or phrases in a sentence for the sentiment classification perspective without explicit word- or phrase-level sentiment polarity information. It identifies the words by weak labels only, i.e., the sentence-level polarity that is more abstracted but easily available. In the proposed model, words are embedded in a fixed-size of continuous vector space using Word2Vec [32], GloVe [33], and FastText [34]. Sentences are represented in a matrix form, whose rows correspond to word vectors, and they are used as the input of a CNN model. The CNN model is trained by considering the sentence-level sentiment polarity as the target, and it produces both the sentence-level polarity score and word-level polarity scores for all words in the sentence, which helps us understand the result of sentence-level sentiment classification. Unlike attention models with RNNs, there is no need to separately learn the weights for the attention. Considering that the same word is used in different contexts for different domains, it is relatively easy to build a dictionary that reflects the characteristics of each domain by using the proposed model.

The rest of this paper is organized as follows. In Section 2, we briefly review and discuss some related works. In Section 3, we demonstrate the architecture of the proposed model. Detailed experimental settings are demonstrated in Section 4 followed by the analysis and discussion of the results. Finally, in Section 5 we present our conclusions.

Section snippets

Related work

In this section, we briefly review the representative studies on for CNN-based document classification [25], weakly supervised learning for CNN-based localization [31], [35], RNN-based document attention model named the hierarchical attention network [11] and other weakly supervised learning approaches in various tasks [36], [37], [38], [39], [40].

Overall framework

Fig. 4 shows the overall framework of the proposed method. After collecting the sentences, low-level embedding is performed by the Word2Vec, GloVe, and FastText methods, and the word vectors in the sentence are concatenated to form the initial input matrix for the CNN. Once the CNN model training is completed, the polarity of a given test sentence is predicted. Then, the weights of the fully connected layer are used to combine the feature maps to produce the CAM2 score for every single word in

Datasets & target labeling

To verify the proposed CAM2, we used two sets of movie reviews, one written in English and the other written in Korean. Not only do movie reviews have explicit sentiment labels (ratings or stars), but they generally also have more subjective expressions compared to other formal texts such as news articles. For the English movie review dataset, we used the publicly available IMDB dataset [41], while Korean movie reviews were collected directly from the WATCHA website (https://watcha.net/), which

Classification performance

Tables 5 and 6 list the mean and standard deviation for each performance measure for each model with the 30 repetitions. HAN yielded the second best ACC/BCR for the IMDB datasets, but resulted in the worst ACC and only the fifth best BCR for the WATCHA dataset. A statistical two-sample hypothesis test indicates that HAN resulted in a higher performance for the IMDB dataset (English and relatively longer reviews), while the CAM2 model achieved a higher performance for the WATCHA dataset (Korean

Conclusion

In this paper, we propose CAM2, a classification and locAlization model with class activation map, which is a sentiment classification model with word localization based on weakly supervised CNN learning. Although the proposed model is trained based on class labels only, it can not only predict the overall sentiment of a given sentence but also find important polarity words significantly contributing the predicted class. Compared to the previous CNN-based text classification model, CAM2

Acknowledgement

This research was supported by (1) Basic Science Research Pro-gram through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B03930729) and Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 2017-0-00349, Development of Media Streaming system with Machine Learning using QoE (Qualityof Experience)).

References (41)

  • T. Nasukawa et al.

    Sentiment analysis: capturing favorability using natural language processing

  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • Z. Yang et al.

    Hierarchical attention networks for document classification

  • O. Vinyals et al.

    Grammar as a foreign language

  • M.-T. Luong, H. Pham and C.D. Manning, Effective approaches to attention-based neural machine translation,...
  • D. Bahdanau, K. Cho and Y. Bengio, Neural machine translation by jointly learning to align and translate,...
  • K. Xu et al.

    Show, attend and tell: neural image caption generation with visual attention

  • Y. Wang et al.

    Attention-based LSTM for aspect-level sentiment classification

  • S. Chopra et al.

    Abstractive sentence summarization with attentive recurrent neural networks

  • A. Zadeh, M. Chen, S. Poria, E. Cambria and L.-P. Morency, Tensor fusion network for multimodal sentiment analysis,...
  • Cited by (0)

    View full text