An improved sentiment classification model based on data quality and word embeddings

Siagh, Asma; Laallam, Fatima Zohra; Kazar, Okba; Salem, Hajer

doi:10.1007/s11227-023-05099-1

An improved sentiment classification model based on data quality and word embeddings

Published: 03 March 2023

Volume 79, pages 11871–11894, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Asma Siagh¹,
Fatima Zohra Laallam¹,
Okba Kazar^2,3 &
…
Hajer Salem⁴

244 Accesses
Explore all metrics

Abstract

User-generated content on social media platforms has reached big data levels. Sentiment analysis of this data provides opportunities to gain valuable insights into any domain. However, analyzing real-world data may confront the challenge of class imbalance, which can adversely affect the generalization ability of models due to majority class overfitting. Therefore, having an efficient model that manages any scenario of imbalanced data is practically needed. In this light, this work proposes different models based on studying the impact of data quality and transfer learning through pre-trained embeddings on boosting minority class detection. The proposed models are tested on imbalanced datasets related to social media and education. The experimental results highlight the effectiveness of Wor2vec, Glove, and Fasttext embeddings with preprocessed data. In contrast, BERT embeddings present better results with no-preprocessed data. Furthermore, in comparison with other methods, the best-performing model resulting from this study shows outperformance with notable improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Article 19 November 2021

Data availability

The datasets used during the current study are publicly available online at https://data.mendeley.com/datasets/6ndwt6s5ry/1 and https://www.kaggle.com/datasets/septa97/100k-courseras-course-reviews-dataset.

Notes

References

Ghani NA, Hamid S, Hashem IAT, Ahmed E (2019) Social media big data analytics: a survey. Comput Human Behav 101:417–428
Article Google Scholar
Kordzadeh N, Young DK (2020) How social media analytics can inform content strategies. J Comput Inform Syst. 62:1–13
Google Scholar
Iqbal A, Amin R, Iqbal J, Alroobaea R, Binmahfoudh A, Hussain M (2022) Sentiment analysis of consumer reviews using deep learning. Sustainability 14(17):10844
Article Google Scholar
Arya V, Mishra AKM, Gonzalez-Briones A et al (2022) Analysis of sentiments on the onset of COVID-19 using machine learning techniques. Adv Distrib Comput Artif Intell 11:45–63
Chang YC, Ku CH, Le Nguyen DD (2022) Predicting aspect-based sentiment using deep learning and information visualization: the impact of COVID-19 on the airline industry. Inform Manag 59(2):103587
Article Google Scholar
Matalon Y, Magdaci O, Almozlino A, Yamin D (2021) Using sentiment analysis to predict opinion inversion in Tweets of political communication. Sci. Rep 11(1):1–9
Article Google Scholar
Mee A, Homapour E, Chiclana F, Engel O (2021) Sentiment analysis using TF-IDF weighting of UK MPs’ tweets on Brexit. KnowlSyst 228:107238
Google Scholar
Tang Y, Hew KF (2017) Using Twitter for education: beneficial or simply a waste of time? Comput Educ 106:97–118
Article Google Scholar
Stathopoulou A, Siamagka NT, Christodoulides G (2019) A multi-stakeholder view of social media as a supporting tool in higher education: an educator-student perspective. Eur Manag J 37(4):421–431
Article Google Scholar
Jaremko KM, Schwenk ES, Pearson ACS, Hagedorn J, Udani AD, Schwartz G et al (2019) Teaching an old pain medicine Society new tweets: integrating social media into continuing medical education. Korean J Anesthesiol 72(5):409
Article Google Scholar
Motta J, Barbosa M (2018) Social media as a marketing tool for European and North American universities and colleges. J Intercult Manag 10(3):125–154
Article Google Scholar
Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th international acm sigir conference on research and development in information retrieval; 2015. p. 959–962
Rehman AU, Malik AK, Raza B, Ali W (2019) A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimed Tools Appl 78(18):26597–26613
Article Google Scholar
Pandey H, Mishra AK, Kumar DN. Various aspects of sentiment analysis: a review. In: Proceedings of 2nd international conference on advanced computing and software engineering (ICACSE). 2019
Habimana O, Li Y, Li R, Gu X, Yu G (2020) Sentiment analysis using deep learning approaches: an overview. Sci China Inform Sci 63:1–36
Article Google Scholar
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Sadr H, Nazari Soleimandarabi M (2022) ACNN-TL: attention-based convolutional neural network coupling with transfer learning and contextualized word representation for enhancing the performance of sentiment classification. J Supercomput 78:1–27
Article Google Scholar
Nguyen CV, Le KH, Tran AM, Pham QH, Nguyen BT (2022) Learning for amalgamation: a multi-source transfer learning framework for sentiment lassification. Inform Sci 590:1–14
Article Google Scholar
Sivakumar S, Rajalakshmi R (2022) Context-aware sentiment analysis with attention-enhanced features from bidirectional transformers. Soc Netw Anal Min 12(1):104. https://doi.org/10.1007/s13278-022-00910-y
Article Google Scholar
Chan JYL, Bea KT, Leow SMH, Phoong SW, Cheng WK (2022) State of the art: a review of sentiment analysis based on sequential transfer learning. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10183-8
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst 26
Pennington J, Socher R, Manning CD. (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), p. 1532–1543
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018;
Rogers A, Kovaleva O, Rumshisky A (2020) A primer in bertology: what we know about how bert works. Trans Assoc Comput Linguist 8:842–866
Article Google Scholar
Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1253
Article Google Scholar
Dang CN, Moreno-García MN, De la Prieta F (2021) Hybrid deep learning models for sentiment analysis. Complexity 9:9986920
Google Scholar
Xu G, Meng Y, Qiu X, Yu Z, Wu X (2019) Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7:51522–51532
Article Google Scholar
Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing. 337:325–338
Article Google Scholar
Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gener Comput Syst 115:279–294
Article Google Scholar
Bhuvaneshwari P, Rao AN, Robinson YH, Thippeswamy MN (2022) Sentiment analysis for user reviews using Bi-LSTM self-attention based CNN model. Multimed Tools Appl 81(9):12405–12419. https://doi.org/10.1007/s11042-022-12410-4
Article Google Scholar
Jain PK, Saravanan V, Pamula R (2021) A hybrid CNN-LSTM: a deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Trans Asian Low Resour Language Inform Process 20(5):1–15
Article Google Scholar
Ramaswamy SL, Chinnappan J (2022) RecogNet-LSTM+ CNN: a hybrid network with attention mechanism for aspect categorization and sentiment classification. J Intell Inform Syst 58(2):379–404
Article Google Scholar
Ayetiran EF (2022) Attention-based aspect sentiment classification using enhanced learning through CNN-BiLSTM networks. Knowl Based Syst 252:109409
Article Google Scholar
Rani S, Bashir AK, Alhudhaif A, Koundal D, Gunduz ES et al (2022) An efficient CNN-LSTM model for sentiment detection in# BlackLivesMatter. Expert Syst Appl 193:116256
Article Google Scholar
Yin W, Schütze H (2018) Attentive convolution: equipping cnns with rnn-style attention mechanisms. Trans Assoc Comput Linguist 6:687–702
Article Google Scholar
Liu Y, Ji L, Huang R, Ming T, Gao C, Zhang J (2019) An attention-gated convolutional neural network for sentence classification. Intell Data Anal. 23(5):1091–1107
Article Google Scholar
Liao W, Zhou J, Wang Y, Yin Y, Zhang X (2022) Fine-grained attention-based phrase-aware network for aspect-level sentiment analysis. Artif Intell Rev 55(5):3727–3746. https://doi.org/10.1007/s10462-021-10080-6
Article Google Scholar
Wadawadagi R, Pagi V (2022) Polarity enriched attention network for aspect-based sentiment analysis. International Journal of Information Technology. 14(6):2767–2778. https://doi.org/10.1007/s41870-022-01089-3
Article Google Scholar
Liu S, Zhang K (2020) Under-sampling and feature selection algorithms for S2SMLP. IEEE Access. 8:191803–191814
Article Google Scholar
Ling CX, Li C. Data Mining for Direct Marketing: Problems and Solutions. In: Proceedings of the fourth international conference on knowledge discovery and data mining. AAAI Press; 1998. p. 73–79
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Wei J, Zou K. Eda (2019) Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196
Kumar V, Choudhary A, Cho E. (2020) Data augmentation using pre-trained transformer models. arXiv preprint arXiv:2003.02245
Garg S, Ramakrishnan G. Bae (2020) Bert-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970
Kobayashi S. (2018) Contextual augmentation: Data augmentation by words with paradigmatic relations. arXiv preprint arXiv:1805.06201
Moreno Barea FJ, Jerez JM, Franco L (2020) Improving classification accuracy using data augmentation on small data sets. Exp Syst Appl 161:113696
Article Google Scholar
Wu JL, Huang S (2022) Application of generative adversarial networks and Shapley algorithm based on easy data augmentation for imbalanced text data. Appl Sci 12(21):10964
Article Google Scholar
Huang B, Guo R, Zhu Y, Fang Z, Zeng G, Liu J et al (2022) Aspect-level sentiment analysis with aspect-specific context position information. Knowl Syst 243:108473
Article Google Scholar
Madabushi HT, Kochkina E, Castelle M. (2020) Cost-sensitive BERT for generalisable sentence classification with imbalanced data. arXiv preprint arXiv:2003.11563
Siagh A, Laallam FZ, Kazar O. (2022) Building a multilingual corpus of tweets relating to algerian higher education. In: International conference on intelligent systems and pattern recognition. Springer, p. 132–138
Pennington J, Socher R, Manning CD. (2014) GloVe: Global Vectors for Word Representation. In: Empirical methods in natural language processing (EMNLP) p. 1532–1543. Available from: http://www.aclweb.org/anthology/D14-1162
Sanh V, Debut L, Chaumond J, Wolf T. (2020) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv. Available from: arXiv:1910.01108

Download references

Funding

The authors gratefully acknowledge financial support from “La Direction Générale de la Recherche Scientifique et du Développement Technologique (DGRSDT)” of Algeria.

Author information

Authors and Affiliations

Laboratoire d’INtelligence Artificielle et des Technologies de l’Information (LINATI), Department of Computer Science and Information Technologies, Kasdi Merbah University Ouargla, Ouargla, Algeria
Asma Siagh & Fatima Zohra Laallam
Smart Computer Science Laboratory (LINFI), Computer Science Department, University of Biskra, Biskra, Algeria
Okba Kazar
Department of Information Systems and Security, College of Information Technology, United Arab Emirate University, Al Ain, UAE
Okba Kazar
Pôle R&D, Audensiel Technologies, Rue Nationale, Boulogne-Billancourt, 92100, Ile-de-France, France
Hajer Salem

Authors

Asma Siagh
View author publications
You can also search for this author in PubMed Google Scholar
Fatima Zohra Laallam
View author publications
You can also search for this author in PubMed Google Scholar
Okba Kazar
View author publications
You can also search for this author in PubMed Google Scholar
Hajer Salem
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: A.S.; Methodology: A.S.; Writing—original draft preparation: A.S.; Supervision: F.Z.L.; Supervision: O.K.; Review and editing: H.S.

Corresponding author

Correspondence to Asma Siagh.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Siagh, A., Laallam, F.Z., Kazar, O. et al. An improved sentiment classification model based on data quality and word embeddings. J Supercomput 79, 11871–11894 (2023). https://doi.org/10.1007/s11227-023-05099-1

Download citation

Accepted: 04 February 2023
Published: 03 March 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11227-023-05099-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved sentiment classification model based on data quality and word embeddings

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved sentiment classification model based on data quality and word embeddings

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation