Abstract
Nowadays, the internet and social media ease the process of reviewing products. Consumers expose their thoughts, opinions, and experiences about products and services on various forums, websites, and mobile apps. In effect, internet reviews become a decision-maker for many people before getting their desired goods. Actually, text sentiment analysis consists of extracting insights and sentiments from social texts and consumers’ reviews. Hence, various organizations conduct this analysis in order to better understand the attitude as well as the feedback of their customers toward their products. Besides, many scientific researchers are also interested in the analysis of customers’ reviews by labeling them into a set of sentiments using some text classification algorithms. The following paper provides a convolutional neural network (CNN) model to classify text reviews’ sentiments as negative or positive. Also, we make a comparative analysis using our proposed CNN model and several models’ representations of word embedding to get the most efficient model. The experiments are implemented on the Amazon reviews dataset, and the diverse model designs have achieved appropriate performances. Reached results discern the importance of including stop-word in sentiment analysis tasks, in fact, the stop words elimination can provoke an inaccurate prediction of sentiments. Practically, using stop words with the CNN model has improved the accuracy result by 2% opposing the CNN model that has ignored them. Furthermore, we procure that the employment of a random initialization approach provides better performance than supervised and embedding model vectors on large-scale datasets. Effectively, training the representation of word embedding allows the model to learn better features in less computation time. Moreover, our CNN model showed better performance than the baseline machine learning and deep learning methods and improved the accuracy of the CNN to 90% on the Amazon reviews dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are openly available in Kaggle at [19].
References
Gentzkow M, Kelly B, Taddy M (2019) Text as data. J Econ Lit 57(3):535–74
Varathan KD, Giachanou A, Crestani F (2017) Comparative opinion mining: a review. J Am Soc Inf Sci 68(4):811–829
Gautam G, Yadav D (2014) Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 Seventh International Conference on Contemporary Computing (IC3), IEEE, pp 437–442
Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126
Radhakrishnan A, Vaidhehi V (2017) Email classification using machine learning algorithms. Int J Eng Technol (IJET) 9(2):335–340
Acheampong FA, Wenyu C, Nunoo-Mensah H (2020) Text-based emotion detection: advances, challenges, and opportunities. Eng Rep 2(7):e12189
Su YJ, Hu WC, Jiang JH et al (2020) A novel LMAEB-CNN model for chinese microblog sentiment analysis. J Supercomput 76(11):9127–9141. https://doi.org/10.1007/s11227-020-03198-x
Priyadarshini I, Cotton C (2021) A novel LSTM-CNN-grid search-based deep neural network for sentiment analysis. J Supercomput 77(12):13911–13932. https://doi.org/10.1007/s11227-021-03838-w
Denecke K (2008) Using sentiwordnet for multilingual sentiment analysis. In: 2008 IEEE 24th International Conference on Data Engineering Workshop, IEEE, pp 507–512
Celikyilmaz A, Hakkani-Tür D, Feng J (2010) Probabilistic model-based sentiment analysis of twitter messages. In: 2010 IEEE Spoken Language Technology Workshop, IEEE, pp 79–84
Akaichi J (2013) Social networks’ facebook’statutes updates mining for sentiment classification. In: 2013 International Conference on Social Computing, IEEE, pp 886–891
Valakunde N, Patwardhan M (2013) Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process. In: 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies, IEEE, pp 188–192
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751, https://doi.org/10.3115/v1/D14-1181
Conneau A, Schwenk H, Barrault L, et al (2017) Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Association for Computational Linguistics, Valencia, Spain, pp 1107–1116, https://aclanthology.org/E17-1104
Zhang Y, Zhang Z, Miao D et al (2019) Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf Sci 477:55–64
Mandelbaum A, Shalev A (2016) Word embeddings and their use in sentence classification tasks. CoRR abs/1610.08229:16. arXiv:1610.08229
Kulkarni N et al (2021) A comparative study of word embedding techniques to extract features from text. Turkish J Comput Math Educ (TURCOMAT) 12(12):3550–3557
Zhang Y, Wallace BC (2017) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Kondrak G, Watanabe T (eds) Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017 - Volume 1: Long Papers. Asian Federation of Natural Language Processing, pp 253–263, https://aclanthology.org/I17-1026/
Amazon reviews dataset https://www.kaggle.com/bittlingmayer/amazonreviews accessed on 01/28/2023
Word2vec https://code.google.com/archive/p/word2vec/ Accessed on 01/28/2023
Mikolov T, Chen K, Corrado G, et al (2013a) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, p 12, arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, et al (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Ghahramani Z, et al (eds) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119, https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html
Mikolov T, Yih Wt, Zweig G (2013c) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 746–751
Glove https://nlp.stanford.edu/projects/glove/ accessed on 01/28/2023
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Fasttext english vectors https://fasttext.cc/docs/en/english-vectors.html Accessed on 01/28/2023
Mikolov T, Grave E, Bojanowski P, et al (2017) Advances in pre-training distributed word representations. CoRR abs/1712.09405:4. arXiv:1712.09405
Fasttext supervised models https://fasttext.cc/docs/en/supervised-models.html Accessed on 01/28/2023
Joulin A, Grave E, Bojanowski P, et al (2017) Bag of tricks for efficient text classification. In: Lapata M, Blunsom P, Koller A (eds) Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers. Association for Computational Linguistics, pp 427–431, https://doi.org/10.18653/v1/e17-2068
Joulin A, Grave E, Bojanowski P, et al (2016) Fasttext.zip: Compressing text classification models. CoRR abs/1612.03651:13. arXiv:1612.03651
Gensim library https://pypi.org/project/gensim/ accessed on 01/28/2023
Minaee S, Kalchbrenner N, Cambria E et al (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surveys (CSUR) 54(3):1–40
Severyn A, Moschitti A (2015) Unitn: Training deep convolutional neural network for twitter sentiment classification. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp 464–469
Ouyang X, Zhou P, Li CH, et al (2015) Sentiment analysis using convolutional neural network. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing pp 2359–2364
Kim H, Jeong YS (2019) Sentiment classification using convolutional neural networks. Appl Sci 9(11):2347
Yassine M, Hajj H (2010) A framework for emotion mining from text in online social networks. In: 2010 IEEE International Conference on Data Mining Workshops, IEEE, pp 1136–1142
Ghiassi M, Lee S (2018) A domain transferable lexicon set for twitter sentiment analysis using a supervised machine learning approach. Expert Syst Appl 106:197–216
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Scott D, Daelemans W, Walker MA (eds) Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 21-26 July, 2004, Barcelona, Spain. ACL, pp 271–278, https://doi.org/10.3115/1218955.1218990
Hasan A, Moin S, Karim A et al (2018) Machine learning-based sentiment analysis for twitter accounts. Math Comput Appl 23(1):11
Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp 43–48
Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: International Conference on Discovery Science. Springer, pp 1–15
Singh PK, Husain MS (2014) Methodological study of opinion mining and sentiment analysis techniques. Int J Soft Comput 5(1):11
Rashid A, Anwer N, Iqbal M et al (2013) A survey paper: areas, techniques and challenges of opinion mining. Int J Comput Sci Issues (IJCSI) 10(6):18
Smeureanu I, Bucur C (2012) Applying supervised opinion mining techniques on online user reviews. Informatica Economica 16(2):81
Cheong M, Lee V (2011) A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via twitter. Inf Syst Front 13(1):45–59
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12):2009
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135. https://doi.org/10.1561/1500000001
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113. https://doi.org/10.1016/j.asej.2014.04.011
Das B, Chakraborty S (2018) An improved text sentiment classification model using TF-IDF and next word negation. CoRR abs/1806.06407:6. arXiv:1806.06407
Cruz NP, Taboada M, Mitkov R (2016) A machine-learning approach to negation and speculation detection for sentiment analysis. J Am Soc Inf Sci 67(9):2118–2136
Kennedy A, Inkpen D (2006) Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22(2):110–125
Wan X (2012) A comparative study of cross-lingual sentiment classification. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE, pp 24–31
Niu Z, Yin Z, Kong X (2012) Sentiment classification for microblog by machine learning. In: 2012 Fourth International Conference on Computational and Information Sciences, Ieee, pp 286–289
Neethu M, Rajasree R (2013) Sentiment analysis in twitter using machine learning techniques. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), IEEE, pp 1–5
Karamibekr M, Ghorbani AA (2013) A structure for opinion in social domains. In: 2013 International Conference on Social Computing, IEEE, pp 264–271
Bahrainian SA, Dengel A (2013) Sentiment analysis using sentiment features. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), IEEE, pp 26–29
Antai R (2014) Sentiment classification using summaries: A comparative investigation of lexical and statistical approaches. In: 2014 6th Computer Science and Electronic Engineering Conference (CEEC), IEEE, pp 154–159
Nguyen H, Veluchamy A, Diop M et al (2018) Comparative study of sentiment analysis with product reviews using machine learning and lexicon-based approaches. SMU Data Sci Rev 1(4):7
Socher R, Huval B, Manning CD, et al (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 1201–1211
Socher R, Perelygin A, Wu J, et al (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 1631–1642
Dong L, Wei F, Tan C, et al (2014) Adaptive recursive neural network for target-dependent twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), pp 49–54
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers. The Association for Computer Linguistics, pp 1556–1566, https://doi.org/10.3115/v1/p15-1150
Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1422–1432
Huang M, Cao Y, Dong C (2016) Modeling rich contexts for sentiment classification with LSTM. CoRR abs/1605.01478:9. arXiv:1605.01478
Qian Q, Huang M, Zhu X (2016) Linguistically regularized lstms for sentiment classification. CoRR abs/1611.03949:11. arXiv:1611.03949
Iqbal A, Amin R, Iqbal J et al (2022) Sentiment analysis of consumer reviews using deep learning. Sustainability 14(17):10844
Deriu J, Lucchi A, De Luca V, et al (2017) Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In: Proceedings of the 26th International Conference on World Wide Web, pp 1045–1052
Devlin J, Chang MW, Lee K, et al (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186, https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Jin Z, Lai X, Cao J (2020) Multi-label sentiment analysis base on BERT with modified TF-IDF. In: 2020 IEEE International Symposium on Product Compliance Engineering-Asia (ISPCE-CN). IEEE. https://doi.org/10.1109/ispce-cn51288.2020.9321861
Liu Y, Lu J et al (2020) Sentiment analysis for e-commerce product reviews by deep learning model of bert-BiGRU-softmax. Math Biosci Eng 17(6):7819–7837. https://doi.org/10.3934/mbe.2020398
Prottasha NJ, Sami AA, Kowsher M et al (2022) Transfer learning for sentiment analysis using BERT based supervised fine-tuning. Sensors 22(11):4157. https://doi.org/10.3390/s22114157
Bilal M, Almazroi AA (2022) Effectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviews. Electron Commer Res. https://doi.org/10.1007/s10660-022-09560-w
Mutinda J, Mwangi W, Okeyo G (2023) Sentiment analysis of text reviews using lexicon-enhanced bert embedding (LeBERT) model with convolutional neural network. Appl Sci 13(3):1445. https://doi.org/10.3390/app13031445
Bello A, Ng SC, Leung MF (2023) A BERT framework to sentiment analysis of tweets. Sensors 23(1):506. https://doi.org/10.3390/s23010506
Zhang X, Zhao JJ, LeCun Y (2015) Character-level convolutional networks for text classification. In: Cortes C, Lawrence ND, Lee DD, et al (eds) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp 649–657, https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html
Goldberg Y (2016) A primer on neural network models for natural language processing. J Artif Intell Res 57:345–420
Kocmi T, Bojar O (2017) An exploration of word embedding initialization in deep-learning tasks. In: Bandyopadhyay S (ed) Proceedings of the 14th International Conference on Natural Language Processing, ICON 2017, Kolkata, India, December 18-21, 2017. NLP Association of India, pp 56–64, https://aclanthology.org/W17-7508/
Gholamalinezhad H, Khosravi H (2020) Pooling methods in deep neural networks, a review. CoRR abs/2009.07485:16. arXiv:2009.07485
The complete Amazon reviews data http://snap.stanford.edu/data/web-Amazon-links.html Accessed on 01/28/2023
McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems, pp 165–172
MR reviews dataset https://www.cs.cornell.edu/people/pabo/movie-review-data/ Accessed on 01/28/2023
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Ann Arbor, Michigan, pp 115–124, https://doi.org/10.3115/1219840.1219855, https://aclanthology.org/P05-1015
IMDB reviews dataset https://www.cs.cornell.edu/people/pabo/movie-review-data/ Accessed on 01/28/2023
Twitter dataset https://www.kaggle.com/code/durgeshrao9993/twitter-sentiment-analysis-with-nlp/data Accessed on 01/28/2023
Regex https://docs.python.org/3/library/re.html Accessed on 01/28/2023
NLTK https://www.nltk.org/ Accessed on 01/28/2023
Porter stemmer https://github.com/jedijulia/porter-stemmer Accessed on 01/28/2023
Karaa WBA, Gribâa N (2013) Information retrieval with porter stemmer: A new version for english. In: Advances in Intelligent Systems and Computing. Springer International Publishing, p 243–254, https://doi.org/10.1007/978-3-319-00951-3_24
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
A.B. contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qorich, M., El Ouazzani, R. Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks. J Supercomput 79, 11029–11054 (2023). https://doi.org/10.1007/s11227-023-05094-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05094-6