Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks
Introduction
The social media has developed phenomenally over the past decade, e.g., Twitter, Facebook, LinkedIn, WeChat, and Instagram have been utilized extensively. For instance, Twitter has 321 million dynamic users [1]. On these social media applications, people are addicted to express their opinions, suggestions, likes, dislikes, and experiences in a profoundly casual manner. This informal way of interaction motivates researchers to comprehend and explore valuable insights by using different tools and methods, such as speech recognition and sentiment analysis, etc. In addition, it is associated with some of the dynamic tasks identified with syntactic, lexical, and semantic regularities.
In the 90s, the discrete vector-space representations were among the popular natural language representation models. These models include Bag-of-Words (BOWs), Term Frequency, and Inverted Document Frequency Term Frequency (TF–IDF) or count vectorizer, utilized for feature extraction. However, the technique BOWs was found problematic with the semantics as well as in the words’ Ordering. Some more consolidations, like Continuous Bag-of-Words (CBOW) and N-Gram models, mostly perform best [2] concerning the statistical language model for the prediction of a word as described in [3]. The improvement is made by incorporating sampling and additional sub-sampling for continuous words proposed in [4] . Later on, a dense representation of words technique in [5] was considered useful to handle the issue of information sparsity through n-gram models with word representation known as word embedding. For the sentiment detection, it is essential to understand the meaning of words and their application. Therefore, it is better to transform it into the vector space model to regard as features utilized in many areas such as classification of documents [6], Information Retrieval [7]. Natural Language Processing (NLP) techniques used to understand subjectivity in sentences by focusing on contents and phonetics, which rely on pattern recognition and learning of words.
With the advancement in artificial intelligence (AI), deep neural networks (DNNs) help the algorithms to learn by overcoming the issue of information sparsity and equipped them for training the complex models with relatively larger data sets as presented in several works [8], [9], [10]. The capacity to catch semantic symmetries, the word embeddings for vectors over words in neural networks (NNs) are utilized that are grammatically or semantically related to each other. It has broad applicability in different applications like machine interpretation, and especially for sentiment analysis that brings a similar connection in the vector space. A predominant word embedding model utilize a convolution neural network (CNNs) designed by C&W (Collobert and Weston) presented in [11].
Moreover, to catch the syntactic and semantic relations among numerous deep learning models of word embedding, global vectors GloVe [12] and Word2Vec [3] are utilized. These embedding models received much appreciation from researchers for sentiment analysis and reveal the useful deep learning methods used for constructing vector representations of words. These model are used in [13], [14], [15], [16]. However, these representations found to be problematic to some extent, like, for training and representing a vector identified with each word which does not fit with small corpora, requires large corpora as referenced in [17]. Similarly, researchers with small datasets have to rely on DWRs; GloVe and Word2Vec which are not good in a number in situations, such as time complexity, computational cost, and effectiveness, as presented in [18], [19].
Furthermore, these techniques map words into close vectors having dissimilarity by disregarding the sentiment information in text, which ultimately compromising model precession. In the word vector calculations, these embeddings ignore the context of the document. To understand morphological forms of words, a variation in word2Vec called Fasttext was introduced in [20]. It is a proficient vector representation of uncommon words by concentrating on the inner structure of words. Learning through Fasttext improves on syntactic assignments because the morphology of the words is related to syntactic nature. Also, the utilization of word vectors must represent the words present in vocabulary in a compact format.
For text classification, tagging of Part-of-speech (POS) is an effective method by providing sufficient information about a word and its neighbors. To manage the syntactic regularities in words, it uses the method of dividing the sentences into words and assigning a tag to each word dependent on the POS labeling rules such as adverb, noun, adjective, and verb [21]. It is also capable of discovering similarities and dissimilarities between the words, which are an incredible source for the prediction of sentiments. Although embeddings and language model has been useful in many tasks, still they are unable to capture words connections in a sentence, which is essential for dependency parsing to establish relations between words. Subsequently, word2Vec relies on associated words to understand word combination [22] as well for the syntactic and semantic regularities despite the occurrence of words. Therefore, GloVe is considered to be exceptional in interpreting the sentiments by its meaning, as presented in [23].
In machine learning, many methodologies for the text classification have been proposed, for example, KNN, Naïve Bayes, Support Vector Machine (SVM), Decision Trees, and K-Means. In Contrast, models based on neural networks have performed better results in word representations as compared with conventional techniques that deal semantics, for example, Latent Semantic Analysis. Among numerous neural networks, CNNs and recurrent neural networks (RNNs) have produced better results and especially task-related to text classification with the sentiment detections. For instance, comprehensive experimental work has been presented in [24] to improve the semantic-dependent on the tree by using the abilities of the RNN variants and LSTM (long-short-term memory). To encode character input of the high-level features, the mix of the neural networks is executed by involving the CNNs and RNNs to learn the input sequence accordingly [25]. In order to extract higher features, CNNs [26] need to rely on several layers of convolution to capture long-term dependencies because of the locality of the convolutional and pooling. This problem can address by using a recurrent neural network, where a single layer is enough capable hold long term dependencies. Overall, neural network-based methodologies [27] get the input as vector representation involving several layers to predict the sentiments.
This work is carrying out sentiment analysis is in three intentions, first in the search of sufficient DWR to address the problems through GloVe, Word2Vec, FastText with sub-word. Secondly, the generation of efficient sentence vector based on concatenation of said DWRs with the tagging of parts of speech to enhance the evaluation metrics. While, in the last, using joint neural architecture comprises of Bi-LSTM (Bidirectional Long-short term memory), and Bi-GRU (Bidirectional Gated Recurrent Unit), which is considerate for the processing of texts as successive information in both directions to hold long-term dependencies with CNN since it bound the receptive fields of the hidden layer regardless of their position. Additionally, convolutional layers extract the local features which are expedient for the prediction of sentiments. Our proposed methodology is also different from many recent approaches in terms of windows of various weights and lengths to adopt long-term dependencies in a bidirectional manner. While CNNs, with an attentive pooling, efficiently moves from left to right instead of the entire expression by taking the adequate position of the encoded features.
Following are the main contributions of this work:
- 1.
In respect of unsupervised learning, evaluate the effectiveness of neural language models for the initialization of distributed word representations enriched with sentiment related words.
- 2.
For the improvements towards the efficacy in word embeddings, an efficient word representation approach named dense efficient concatenated representation (DECR), based on sequential concatenation and POS tagger, proposed to generate novel representation.
- 3.
These representations, along with “DECR” concerning appropriate hyperparameters considered as input to proposed architecture primarily by the variations in the RNN layer with bidirectional aspects for learning the long term dependencies. Further contributed to CNN using weighted attentive operation with windows of various lengths and weights to generate the feature maps to achieve competitive results with the tuning and setting of few possible parameters.
The rest of the paper is structured as follows. Section 2 presents related work. The proposed joint neural network is presented in Section 3. While Section 4 contains the experimentations. The discussion is in Section 5. Finally, the paper is concluded in Section 6.
Section snippets
Related work
In respect of NLP, the considerable measure of relevant research has been identified with sentence classification tasks for Sentiment Analysis. The models subject to NLP incorporates associations between the atomic and discrete symbols defined with words for perceiving the information. In the most recent two decades, there is a transformation of words as continuous vectors, which are introduced by [28], [29], [30]. It is observed that the representation of words through vectors get exceptional
Bi-directional recurrent convolutional architecture for sentiment analysis
This section presents a theoretical model of the proposed methodology in an accompanying way, such as pre-processing-phase which incorporates dataset description, and it is pre-processing along with the representation of pre-trained word vectors through Word2vev [3], GloVe [12] and Fasttext with Sub-word (SW) [20]. Further, a new DECR by utilizing bidirectional long short-term memory and gated recurrent Units; Bi-LSTM, Bi-GRU. While in the post-processing-phase, a CNN comprise of pooling to
Experimentations
The objective to examine the considered techniques; our work undergoes an effective sentiment analysis classifier consisting of RNNs variants and CNN. The integration process includes multiple DWRs, as mentioned above, as well as evaluating the influences of the associated hyperparameters. It has been observed that the results, without testing numerous configurations (Hyperparameter tuning), affect the training of network in respect of abundant hardware, which eventually influences the overall
Discussion
Currently, adequate improvements have done through deep learning approaches related to numerous NLP tasks. Our work has promoted the accessibility of DWRs in terms of pre-trained word vector deriving out of expanded corpora with various domains. We investigate the quality of these representations to execute sentiment analysis. We have been observing the factors like training method and training corpus size for the quality of the proposed method. For sentiment analysis, it is significant to
Conclusion
We conclude our work on specific results; initially, we utilized the domain-specific pre-processing techniques and POS tagging on multi-source datasets for strengthening the efficiency of distributed word representations, which outperform vocabulary sense. Further, we have described the use of different word representations, including Word2Vec, GloVe, fastText, and mixing the representations nearby weighted mechanism as opposed to include single learning model. We perceived that models trained
CRediT authorship contribution statement
Fazeel Abid: Conceptualization, Data curation, Formal analysis, Writing - original draft, Writing - review & editing , Methodology. Chen Li: Methodology, Resources, Supervision, Validation, Visualization. Muhammad Alam: Writing - review & editing, Data curation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (118)
- et al.
Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis
Neurocomputing
(2017) - et al.
Enhancing deep learning sentiment analysis with ensemble techniques in social applications
Expert Syst. Appl.
(2017) - et al.
A topic-enhanced word embedding for Twitter sentiment classification
Inf. Sci. (Ny)
(2016) - et al.
Sentiment analysis leveraging emotions and word embeddings
Expert Syst. Appl.
(2017) - et al.
Learning internal representations by error propagation
Finding structure in time
Cogn. Sci.
(1990)- et al.
Sentiment analysis algorithms and applications: A survey
Ain Shams Eng. J.
(2014) - et al.
A survey on opinion mining and sentiment analysis: Tasks, approaches and applications
Knowl.-Based Syst.
(2015) - et al.
Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization
Procedia Eng.
(2013) - et al.
An empirical convolutional neural network approach for semantic relation classification
Neurocomputing
(2016)
Discriminant document embeddings with an extreme learning machine for classifying clinical narratives
Neurocomputing
Ranked wordnet graph for sentiment polarity classification in Twitter
Comput. Speech Lang.
Sentiment analysis: A review and comparative analysis of web services
Inf. Sci. (Ny)
Efficient estimation of word representations in vector space
Distributed representations ofwords and phrases and their compositionality
A neural probabilistic language model
J. Mach. Learn. Res.
Machine learning in automated text categorization
ACM Comput. Surv.
Combining Text Vector Representations for Information Retrieval
Speech recognition with deep recurrent neural networks
ImageNet classification with deep convolutional neural networks
Commun. ACM
Attention-based LSTM for aspect-level sentiment classification
Enriching word vectors with subword information
The stanford natural language processing group
Word2vec parameter learning explained
Vector representation of words for sentiment analysis using glove
Improved semantic representations from tree-structured long–short-Term memory networks
Efficient character-level document classification by combining convolution and recurrent layers
Character-level convolutional networks for text classification
Learning semantic representations using convolutional neural networks for web search
Parallel Distributed Processing : Explorations in the Microstructure of Cognition
Word Representations: A Simple and General Method for Semi-Supervised Learning
Natural language processing (almost) from scratch
J. Mach. Learn. Res.
Linguistic regularities in continuous spaceword representations
Feature-based sentiment analysis approach for product reviews
J. Softw.
Statistical language models based on neural networks
Wall Str. J.
A scalable hierarchical distributed language model
Sentiment Analysis using Lexicon and Machine Learning-Based Approaches: A Survey
Lexicon-based methods for sentiment analysis
Comput. Linguist.
A holistic lexicon-based approach to opinion mining
Mining and summarizing customer reviews
A comparison of event models for Naive Bayes anti-spam e-mail filtering
Combining lexicon-based and learning-based methods for twitter sentiment analysis
Cited by (39)
A new hybrid deep learning model for monthly oil prices forecasting
2023, Energy EconomicsEmotion prediction for textual data using GloVe based HeBi-CuDNNLSTM model
2024, Multimedia Tools and ApplicationsNews Modeling and Retrieving Information: Data-Driven Approach
2024, Intelligent Automation and Soft ComputingDesign an image-based sentiment analysis system using a deep convolutional neural network and hyperparameter optimization
2024, Multimedia Tools and ApplicationsAUTOMATIC ANALYSIS OF X (TWITTER) DATA FOR SUPPORTING DEPRESSION DIAGNOSIS
2023, Human Technology
- 1
Distributed work representations .
- 2
Weighted Attentive Pooling .
- 3
Dense Efficient Concatenated Representation .