Elsevier

Computer Communications

Volume 157, 1 May 2020, Pages 102-115
Computer Communications

Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks

https://doi.org/10.1016/j.comcom.2020.04.002Get rights and content

Abstract

Subjectivity detection in the text is essential for sentiment analysis, which requires many techniques to perceive unanticipated means of communication. Few accomplishments adapted to capture the syntactic, semantic, and contextual sentimental information via distributed word representations (DWRs)1 . This paper, concatenating the DWRs through a weighted mechanism on Recurrent Neural Network (RNN) variants joint with Convolutional Neural network (CNN) distinctively involving weighted attentive pooling (WAP)2 . Whereas, CNNs with traditional pooling operations comprise many layers merely able to capture enough features. Our considerations empower the sentiment analysis over DWRs contains Word2vec, FastText, and GloVe to produce dense efficient concatenated representation (DECR)3 to hold long term dependencies on a single RNN layer acquired by Parts of Speech Tagging (POS) explicitly with verbs, adverbs, and noun only. Then use these representations gained in a way, inputted to CNN contain single convolution layer engaging WAP on multi-source social media data to handle the issues of syntactic and semantic regularities as well as out of vocabulary (OOV) words. Experimentations demonstrate that DWRs together with proposed concatenation qualified in resolving the mentioned issues by moderate hyper-parameter configurations. Our architecture devoid of stacking multiple layers achieved modest accuracy of 89.67% by DECR-Bi-GRU-CNN (WAP) on IMDB as compared to random initialization 81.11% on SST.

Introduction

The social media has developed phenomenally over the past decade, e.g., Twitter, Facebook, LinkedIn, WeChat, and Instagram have been utilized extensively. For instance, Twitter has 321 million dynamic users [1]. On these social media applications, people are addicted to express their opinions, suggestions, likes, dislikes, and experiences in a profoundly casual manner. This informal way of interaction motivates researchers to comprehend and explore valuable insights by using different tools and methods, such as speech recognition and sentiment analysis, etc. In addition, it is associated with some of the dynamic tasks identified with syntactic, lexical, and semantic regularities.

In the 90s, the discrete vector-space representations were among the popular natural language representation models. These models include Bag-of-Words (BOWs), Term Frequency, and Inverted Document Frequency Term Frequency (TF–IDF) or count vectorizer, utilized for feature extraction. However, the technique BOWs was found problematic with the semantics as well as in the words’ Ordering. Some more consolidations, like Continuous Bag-of-Words (CBOW) and N-Gram models, mostly perform best [2] concerning the statistical language model for the prediction of a word as described in [3]. The improvement is made by incorporating sampling and additional sub-sampling for continuous words proposed in [4] . Later on, a dense representation of words technique in [5] was considered useful to handle the issue of information sparsity through n-gram models with word representation known as word embedding. For the sentiment detection, it is essential to understand the meaning of words and their application. Therefore, it is better to transform it into the vector space model to regard as features utilized in many areas such as classification of documents [6], Information Retrieval [7]. Natural Language Processing (NLP) techniques used to understand subjectivity in sentences by focusing on contents and phonetics, which rely on pattern recognition and learning of words.

With the advancement in artificial intelligence (AI), deep neural networks (DNNs) help the algorithms to learn by overcoming the issue of information sparsity and equipped them for training the complex models with relatively larger data sets as presented in several works [8], [9], [10]. The capacity to catch semantic symmetries, the word embeddings for vectors over words in neural networks (NNs) are utilized that are grammatically or semantically related to each other. It has broad applicability in different applications like machine interpretation, and especially for sentiment analysis that brings a similar connection in the vector space. A predominant word embedding model utilize a convolution neural network (CNNs) designed by C&W (Collobert and Weston) presented in [11].

Moreover, to catch the syntactic and semantic relations among numerous deep learning models of word embedding, global vectors GloVe [12] and Word2Vec [3] are utilized. These embedding models received much appreciation from researchers for sentiment analysis and reveal the useful deep learning methods used for constructing vector representations of words. These model are used in [13], [14], [15], [16]. However, these representations found to be problematic to some extent, like, for training and representing a vector identified with each word which does not fit with small corpora, requires large corpora as referenced in [17]. Similarly, researchers with small datasets have to rely on DWRs; GloVe and Word2Vec which are not good in a number in situations, such as time complexity, computational cost, and effectiveness, as presented in [18], [19].

Furthermore, these techniques map words into close vectors having dissimilarity by disregarding the sentiment information in text, which ultimately compromising model precession. In the word vector calculations, these embeddings ignore the context of the document. To understand morphological forms of words, a variation in word2Vec called Fasttext was introduced in [20]. It is a proficient vector representation of uncommon words by concentrating on the inner structure of words. Learning through Fasttext improves on syntactic assignments because the morphology of the words is related to syntactic nature. Also, the utilization of word vectors must represent the words present in vocabulary in a compact format.

For text classification, tagging of Part-of-speech (POS) is an effective method by providing sufficient information about a word and its neighbors. To manage the syntactic regularities in words, it uses the method of dividing the sentences into words and assigning a tag to each word dependent on the POS labeling rules such as adverb, noun, adjective, and verb [21]. It is also capable of discovering similarities and dissimilarities between the words, which are an incredible source for the prediction of sentiments. Although embeddings and language model has been useful in many tasks, still they are unable to capture words connections in a sentence, which is essential for dependency parsing to establish relations between words. Subsequently, word2Vec relies on associated words to understand word combination [22] as well for the syntactic and semantic regularities despite the occurrence of words. Therefore, GloVe is considered to be exceptional in interpreting the sentiments by its meaning, as presented in [23].

In machine learning, many methodologies for the text classification have been proposed, for example, KNN, Naïve Bayes, Support Vector Machine (SVM), Decision Trees, and K-Means. In Contrast, models based on neural networks have performed better results in word representations as compared with conventional techniques that deal semantics, for example, Latent Semantic Analysis. Among numerous neural networks, CNNs and recurrent neural networks (RNNs) have produced better results and especially task-related to text classification with the sentiment detections. For instance, comprehensive experimental work has been presented in [24] to improve the semantic-dependent on the tree by using the abilities of the RNN variants and LSTM (long-short-term memory). To encode character input of the high-level features, the mix of the neural networks is executed by involving the CNNs and RNNs to learn the input sequence accordingly [25]. In order to extract higher features, CNNs [26] need to rely on several layers of convolution to capture long-term dependencies because of the locality of the convolutional and pooling. This problem can address by using a recurrent neural network, where a single layer is enough capable hold long term dependencies. Overall, neural network-based methodologies [27] get the input as vector representation involving several layers to predict the sentiments.

This work is carrying out sentiment analysis is in three intentions, first in the search of sufficient DWR to address the problems through GloVe, Word2Vec, FastText with sub-word. Secondly, the generation of efficient sentence vector based on concatenation of said DWRs with the tagging of parts of speech to enhance the evaluation metrics. While, in the last, using joint neural architecture comprises of Bi-LSTM (Bidirectional Long-short term memory), and Bi-GRU (Bidirectional Gated Recurrent Unit), which is considerate for the processing of texts as successive information in both directions to hold long-term dependencies with CNN since it bound the receptive fields of the hidden layer regardless of their position. Additionally, convolutional layers extract the local features which are expedient for the prediction of sentiments. Our proposed methodology is also different from many recent approaches in terms of windows of various weights and lengths to adopt long-term dependencies in a bidirectional manner. While CNNs, with an attentive pooling, efficiently moves from left to right instead of the entire expression by taking the adequate position of the encoded features.

Following are the main contributions of this work:

  • 1.

    In respect of unsupervised learning, evaluate the effectiveness of neural language models for the initialization of distributed word representations enriched with sentiment related words.

  • 2.

    For the improvements towards the efficacy in word embeddings, an efficient word representation approach named dense efficient concatenated representation (DECR), based on sequential concatenation and POS tagger, proposed to generate novel representation.

  • 3.

    These representations, along with “DECR” concerning appropriate hyperparameters considered as input to proposed architecture primarily by the variations in the RNN layer with bidirectional aspects for learning the long term dependencies. Further contributed to CNN using weighted attentive operation with windows of various lengths and weights to generate the feature maps to achieve competitive results with the tuning and setting of few possible parameters.

The rest of the paper is structured as follows. Section 2 presents related work. The proposed joint neural network is presented in Section 3. While Section 4 contains the experimentations. The discussion is in Section 5. Finally, the paper is concluded in Section 6.

Section snippets

Related work

In respect of NLP, the considerable measure of relevant research has been identified with sentence classification tasks for Sentiment Analysis. The models subject to NLP incorporates associations between the atomic and discrete symbols defined with words for perceiving the information. In the most recent two decades, there is a transformation of words as continuous vectors, which are introduced by [28], [29], [30]. It is observed that the representation of words through vectors get exceptional

Bi-directional recurrent convolutional architecture for sentiment analysis

This section presents a theoretical model of the proposed methodology in an accompanying way, such as pre-processing-phase which incorporates dataset description, and it is pre-processing along with the representation of pre-trained word vectors through Word2vev [3], GloVe [12] and Fasttext with Sub-word (SW) [20]. Further, a new DECR by utilizing bidirectional long short-term memory and gated recurrent Units; Bi-LSTM, Bi-GRU. While in the post-processing-phase, a CNN comprise of pooling to

Experimentations

The objective to examine the considered techniques; our work undergoes an effective sentiment analysis classifier consisting of RNNs variants and CNN. The integration process includes multiple DWRs, as mentioned above, as well as evaluating the influences of the associated hyperparameters. It has been observed that the results, without testing numerous configurations (Hyperparameter tuning), affect the training of network in respect of abundant hardware, which eventually influences the overall

Discussion

Currently, adequate improvements have done through deep learning approaches related to numerous NLP tasks. Our work has promoted the accessibility of DWRs in terms of pre-trained word vector deriving out of expanded corpora with various domains. We investigate the quality of these representations to execute sentiment analysis. We have been observing the factors like training method and training corpus size for the quality of the proposed method. For sentiment analysis, it is significant to

Conclusion

We conclude our work on specific results; initially, we utilized the domain-specific pre-processing techniques and POS tagging on multi-source datasets for strengthening the efficiency of distributed word representations, which outperform vocabulary sense. Further, we have described the use of different word representations, including Word2Vec, GloVe, fastText, and mixing the representations nearby weighted mechanism as opposed to include single learning model. We perceived that models trained

CRediT authorship contribution statement

Fazeel Abid: Conceptualization, Data curation, Formal analysis, Writing - original draft, Writing - review & editing , Methodology. Chen Li: Methodology, Resources, Supervision, Validation, Visualization. Muhammad Alam: Writing - review & editing, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (118)

  • LaurenP. et al.

    Discriminant document embeddings with an extreme learning machine for classifying clinical narratives

    Neurocomputing

    (2018)
  • Montejo-RáezA. et al.

    Ranked wordnet graph for sentiment polarity classification in Twitter

    Comput. Speech Lang.

    (2014)
  • Serrano-GuerreroJ. et al.

    Sentiment analysis: A review and comparative analysis of web services

    Inf. Sci. (Ny)

    (2015)
  • Twitter

    (2019)
  • T. Joachims, . Thorsten, Text categorization with support vector machines: Learning with many relevant features, in:...
  • MikolovT. et al.

    Efficient estimation of word representations in vector space

  • MikolovT. et al.

    Distributed representations ofwords and phrases and their compositionality

  • BengioY.

    A neural probabilistic language model

    J. Mach. Learn. Res.

    (2003)
  • SebastianiF.

    Machine learning in automated text categorization

    ACM Comput. Surv.

    (2002)
  • CarrilloM. et al.

    Combining Text Vector Representations for Information Retrieval

    (2009)
  • GravesA. et al.

    Speech recognition with deep recurrent neural networks

  • KrizhevskyA. et al.

    ImageNet classification with deep convolutional neural networks

    Commun. ACM

    (2017)
  • Y. Kim, Convolutional neural networks for sentence classification, in: Proceedings of the 2014 Conference on Empirical...
  • R. Collobert, J. Weston, A unified architecture for natural language processing, in: Proceedings of the 25th...
  • J. Pennington, R. Socher, C.D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014...
  • D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning sentiment-specific word embedding for twitter sentiment...
  • WangY. et al.

    Attention-based LSTM for aspect-level sentiment classification

    (2016)
  • Y. Zhang, B. Wallace, A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for...
  • BojanowskiP. et al.

    Enriching word vectors with subword information

  • The stanford natural language processing group

    (2019)
  • RongX.

    Word2vec parameter learning explained

    (2019)
  • SharmaY. et al.

    Vector representation of words for sentiment analysis using glove

    (2017)
  • TaiK.S. et al.

    Improved semantic representations from tree-structured long–short-Term memory networks

  • XiaoY. et al.

    Efficient character-level document classification by combining convolution and recurrent layers

    (2016)
  • ZhangX. et al.

    Character-level convolutional networks for text classification

  • ShenY. et al.

    Learning semantic representations using convolutional neural networks for web search

  • RumelhartD.E. et al.

    Parallel Distributed Processing : Explorations in the Microstructure of Cognition

    (1986)
  • TurianJ. et al.

    Word Representations: A Simple and General Method for Semi-Supervised Learning

    (2010)
  • CollobertR. et al.

    Natural language processing (almost) from scratch

    J. Mach. Learn. Res.

    (2011)
  • MikolovT. et al.

    Linguistic regularities in continuous spaceword representations

  • WangH. et al.

    Feature-based sentiment analysis approach for product reviews

    J. Softw.

    (2014)
  • MikolovT.T. et al.

    Statistical language models based on neural networks

    Wall Str. J.

    (2012)
  • MnihA. et al.

    A scalable hierarchical distributed language model

  • VermaB. et al.

    Sentiment Analysis using Lexicon and Machine Learning-Based Approaches: A Survey

    (2018)
  • TaboadaM. et al.

    Lexicon-based methods for sentiment analysis

    Comput. Linguist.

    (2011)
  • DingX. et al.

    A holistic lexicon-based approach to opinion mining

  • HuM. et al.

    Mining and summarizing customer reviews

  • SchneiderK.-M.

    A comparison of event models for Naive Bayes anti-spam e-mail filtering

  • ZhangL. et al.

    Combining lexicon-based and learning-based methods for twitter sentiment analysis

  • A. Mudinas, D. Zhang, M. Levene, Combining lexicon and learning based approaches for concept-level sentiment analysis,...
  • Cited by (39)

    View all citing articles on Scopus
    1

    Distributed work representations .

    2

    Weighted Attentive Pooling .

    3

    Dense Efficient Concatenated Representation .

    View full text