Abstract
Movies have been important in our lives for many years. Movies provide entertainment, inspire, educate, and offer an escape from reality. Movie reviews help us choose better movies, but reading them all can be time-consuming and overwhelming. To make it easier, sentiment analysis can classify movie reviews into positive and negative categories. Opinion mining (OP), called sentiment analysis (SA), uses natural language processing to identify and extract opinions expressed through text. Naive Bayes, a supervised learning algorithm, offers simplicity, efficiency, and strong performance in classification tasks due to its feature independence assumption. This study evaluates the performance of four Naïve Bayes variations using two vectorization techniques, Count Vectorizer and Term Frequency–Inverse Document Frequency (TF–IDF), on two movie review datasets: IMDb Movie Reviews Dataset and Rotten Tomatoes Movie Reviews. Bernoulli Naive Bayes achieved the highest accuracy using Count Vectorizer on the IMDB and Rotten Tomatoes datasets. Multinomial Naive Bayes, on the other hand, achieved better accuracy on the IMDB dataset with TF–IDF. During preprocessing, we implemented different techniques to enhance the quality of our datasets. These included data cleaning, spelling correction, fixing chat words, lemmatization, and removing stop words. Additionally, we fine-tuned our models through hyperparameter tuning to achieve optimal results. Using TF–IDF, we observed a slight performance improvement compared to using the count vectorizer. The experiment highlights the significant role of sentiment analysis in understanding the attitudes and emotions expressed in movie reviews. By predicting the sentiments of each review and calculating the average sentiment of all reviews, it becomes possible to make an accurate prediction about a movie’s overall performance.
Similar content being viewed by others
Data Availibility Statement
The data that support the findings of this study are openly available through the Open Science Framework at https://github.com/Ankit152/IMDB-sentiment-analysis.git and https://www.kaggle.com/datasets/talha002/rottentomatoes-400k-review
Abbreviations
- ABSA:
-
Aspect-based sentiment analysis
- AI:
-
Artificial intelligence
- BOWs:
-
Bag-of-words
- BNB:
-
Bernoulli Naive Bayes
- CNB:
-
Complement Naive Bayes
- CV:
-
Cross-validation
- DL:
-
Deep learning
- GNB:
-
Gaussian Naive Bayes
- GS:
-
Grid search
- IMDB:
-
Internet movie database
- KNN:
-
K-Nearest Neighbours
- SVM:
-
Support vector machines
- ML:
-
Machine learning
- MNB:
-
Multinomial Naive Bayes
- NB:
-
Naive Bayes
- NLP:
-
Natural language processing
- NLTK:
-
Natural language tool kit
- OP:
-
Opinion mining
- RT:
-
Rotten Tomatoes
- TP:
-
True Positive
- TN:
-
True Negative
- FP:
-
False Positive
- FN:
-
False Negative
- SA:
-
Sentiment analysis
- TF–IDF:
-
Term Frequency–Inverse Document Frequency
- Word2vec:
-
Word to vector
References
Abimanyu AJ, Dwifebri M, Astuti W (2023) Sentiment analysis on movie review from rotten tomatoes using logistic regression and information gain feature selection. Build Inf Technol Sci (BITS) 5(1):162–170
Adam NL, Rosli NH, Soh SC (2021) Sentiment analysis on movie review using Naïve Bayes. In: 2021 2nd International conference on artificial intelligence and data sciences (AiDAS), pp 1–6. https://doi.org/10.1109/AiDAS53897.2021.9574419
Agrawal T (2021) Introduction to hyperparameters. In: Hyperparameter optimization in machine learning: make your machine learning and deep learning models more efficient, pp 1–8. APRESS: New York
Arsyah UI, Pratiwi M, Muhammad A (2024) Twitter sentiment analysis of public space opinions using SVM and TF–IDF methods. Indon J Comput Sci 13(1)
Artur M (2021) Review the performance of the bernoulli Naïve Bayes classifier in intrusion detection systems using recursive feature elimination with cross-validated selection of the best number of features. Proc Comput Sci 190:564–570
Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature extraction in sentiment analysis. J Basic Appl Sci Res 4(3):181–186
Baid P, Gupta A, Chaplot N (2017) Sentiment analysis of movie reviews using machine learning techniques. Int J Comput Appl 179(7):45–49
Banik N, Rahman MHH (2018) Evaluation of Naïve Bayes and support vector machines on Bangla textual movie reviews. In: 2018 International conference on Bangla speech and language processing (ICBSLP), pp 1–6. IEEE
Başarslan MS, Kayaalp F (2023) MBI-GRUMCONV: a novel multi BI-GRU and multi CNN-based deep learning model for social media sentiment analysis. J Cloud Comput. https://doi.org/10.1186/s13677-022-00386-3
Bilal Khan S, Muhammad Arshad SK (2023) Comparative analysis of machine learning models for pdf malware detection: Evaluating different training and testing criteria. J Cyber Secur 5(1), 1–11 https://doi.org/10.32604/jcs.2023.042501
Bodapati JD, Veeranjaneyulu N, Shareef SN (2019) Sentiment analysis from movie reviews using LSTMS. Ingénierie des Systèmes d Inf 24(1):125–129
Cahyanti FE, AlFaraby S (2020) On the feature extraction for sentiment analysis of movie reviews based on SVM. In: 2020 8th International conference on information and communication technology (ICoICT), pp 1–5, IEEE
Danyal MM, Khan SS, Khan M, Ullah S, Mehmood F, Ali I (2024) Proposing sentiment analysis model based on BERT and XLNET for movie reviews. Multimed Tools Appl 1–25
Deepa D, Raaji Tamilarasi A (2019) Sentiment analysis using feature extraction and dictionary-based approaches. In: 2019 Third international conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, pp 786–790. https://doi.org/10.1109/I-SMAC47947.2019.9032456
Dewi C, Chen R-C, Christanto HJ, Cauteruccio F (2023) Multinomial Naïve Bayes classifier for sentiment analysis of internet movie database. Vietnam J Comput Sci 10(04):485–498
Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using Naive Bayes and k-NN classifier. arXiv preprint arXiv:1610.09982
Danyal M M, Haseeb M, Khan S S, Khan B, Ullah S (2024) Opinion Mining on Movie Reviews Based on Deep Learning Models. J Artif Intell (6):(2579–0021).
Danyal M M, Khan S S, Khan M, Ghaffar M B, Khan B, Arshad, M (2023) Sentiment Analysis Based on Performance of Linear Support Vector Machine and Multinomial Naïve Bayes Using Movie Reviews with Baseline Techniques. J Big Data (5).
Horsa OG, Tune KK, et al (2023) Aspect-based sentiment analysis for AFAAN OROMOO movie reviews using machine learning techniques. Appl Comput Intell Soft Comput 2023
Jahromi AH, Taheri M (2017) A non-parametric mixture of gaussian Naive Bayes classifiers based on local independent features. In: 2017 Artificial intelligence and signal processing conference (AISP), pp 209–212. IEEE
Khan M, Khan M S, Alharbi Y (2020) Text mining challenges and applications—a comprehensive review. IJCSNS 20(12):138
Khan SS, Khan M, Ran Q, Naseem R (2018) Challenges in opinion mining, comprehensive. Sci Technol J (Ciencia e Tecnica Vitivinicola) 33(11):123–135
Maas AL, Daly R, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 142–150
Mall P, Kumar M, Kumar A, Gupta A, Srivastava S, Narayan V, Chauhan AS, Srivastava AP (2024) Self-attentive CNN + BERT: An approach for analysis of sentiment on movie reviews using word embedding. Int J Intell Syst Appl Eng 12(12s):612–623
Maulana R, Rahayuningsih PA, Irmayani W, Saputra D, Jayanti WE (2020) Improved accuracy of sentiment analysis movie review using support vector machine based information gain. J Phys Conf Ser 1641:012060
Pimpalkar A, Raj RJR (2022) Mbilstmglove: embedding glove knowledge into the corpus using multi-layer Bilstm deep learning model for social media sentiment analysis. Exp Syst Appl 203:117581. https://doi.org/10.1016/j.eswa.2022.117581
Rahat AM, Kahir A, Masum AKM (2019) Comparison of Naive Bayes and SVM algorithm based on sentiment analysis using review dataset. In: 2019 8th International conference system modeling and advancement in research trends (SMART), pp 266–270. IEEE
Rahman R, Masud MA, Mimi RJ, Dina MNS (2021) Sentiment analysis on Bengali movie reviews using multinomial Naïve Bayes. In: 2021 24th International conference on computer and information technology (ICCIT), pp 1–6. https://doi.org/10.1109/ICCIT54785.2021.9689787
Rizal C, Kifta DA, Nasution RH, Rengganis A, Watrianthos R (2023) Opinion classification for IMDB review based using Naive Bayes method. In: AIP conference proceedings, vol 2913. AIP Publishing: New York
Rotten Tomatoes Movie Reviews dataset https://www.rottentomatoes.com. Accessed on 02 Mar 2023 (2020)
Samsir S, Kusmanto K, Dalimunthe AH, Aditiya R, Watrianthos R (2022) Implementation Naïve Bayes classification for sentiment analysis on internet movie database. Build Inf Technol Sci (BITS) 4(1):1–6
Shackley D, Folajimi Y (2023) Sentiment analysis of fake health news using Naive Bayes classification models. Int J Cognit Lang Sci 17(3):217–224
Sudha N, Govindarajan M (2016) Mining movie reviews using machine learning techniques. Int J Comput Appl 144(5)
Teja JS, Sai GK, Kumar MD, Manikandan R (2018) Sentiment analysis of movie reviews using machine learning algorithms—a survey. Int J Pure Appl Math 118(20):3277–3284
Ullah K, Rashad, A, Khan M, Ghadi Y, Aljuaid H, Nawaz Z et al (2022) A deep neural network-based approach for sentiment analysis of movie reviews. Complexity 2022
Veziroğlu M, Eziroğlu E, Bucak İ.Ö (2024) Performance comparison between Naive Bayes and machine learning algorithms for news classification. In: Bayesian inference-recent trends. IntechOpen
Vielma C, Verma A, Bein D (2023) Sentiment analysis with novel GRU based deep learning networks. In: 2023 IEEE World AI IoT congress (AIIoT), pp 0440–0446. https://doi.org/10.1109/AIIoT58121.2023.10174396
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415(1):295–316
Yusran M, Siswanto S, Islamiyati A (2024) Comparison of multinomial Naïve Bayes and Bernoulli Naïve Bayes on sentiment analysis of Kurikulum Merdeka with query expansion ranking. SISTEMASI 13(1):96–106
Acknowledgements
We sincerely thank everyone who helped us finish this research paper. We are grateful to the participants for their helpful feedback and ideas, which improved our research methods and the quality of our results. We appreciate everyone who gave their time to join our study, as this research wouldn’t have been possible without them. Thank you to everyone who took the time to contribute to this research paper.
Funding
This paper is for free publication.
Author information
Authors and Affiliations
Contributions
The author contributions are as follow: “Conceptualization, MMD and SSK; methodology, MBG and MK; software, MMD, SU; validation, SSK and WK; formal analysis, MK, WK, and MBG; investigation, SU; data curation, SU and SSK; writing-original draft preparation, MMD, and MBG; writing-review and editing, SSK; visualization, MBG, and MK.
Corresponding author
Ethics declarations
Conflict of interest
The authors of this paper declare that they do not have any conflicts of interest.
Financial interests
The authors of this paper have no Conflict of interest relevant to this article’s content to declare.
Ethical approval
Not applicable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Danyal, M.M., Khan, S.S., Khan, M. et al. Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer. Soc. Netw. Anal. Min. 14, 87 (2024). https://doi.org/10.1007/s13278-024-01250-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-024-01250-9