Skip to main content
Log in

Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Movies have been important in our lives for many years. Movies provide entertainment, inspire, educate, and offer an escape from reality. Movie reviews help us choose better movies, but reading them all can be time-consuming and overwhelming. To make it easier, sentiment analysis can classify movie reviews into positive and negative categories. Opinion mining (OP), called sentiment analysis (SA), uses natural language processing to identify and extract opinions expressed through text. Naive Bayes, a supervised learning algorithm, offers simplicity, efficiency, and strong performance in classification tasks due to its feature independence assumption. This study evaluates the performance of four Naïve Bayes variations using two vectorization techniques, Count Vectorizer and Term Frequency–Inverse Document Frequency (TF–IDF), on two movie review datasets: IMDb Movie Reviews Dataset and Rotten Tomatoes Movie Reviews. Bernoulli Naive Bayes achieved the highest accuracy using Count Vectorizer on the IMDB and Rotten Tomatoes datasets. Multinomial Naive Bayes, on the other hand, achieved better accuracy on the IMDB dataset with TF–IDF. During preprocessing, we implemented different techniques to enhance the quality of our datasets. These included data cleaning, spelling correction, fixing chat words, lemmatization, and removing stop words. Additionally, we fine-tuned our models through hyperparameter tuning to achieve optimal results. Using TF–IDF, we observed a slight performance improvement compared to using the count vectorizer. The experiment highlights the significant role of sentiment analysis in understanding the attitudes and emotions expressed in movie reviews. By predicting the sentiments of each review and calculating the average sentiment of all reviews, it becomes possible to make an accurate prediction about a movie’s overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availibility Statement

The data that support the findings of this study are openly available through the Open Science Framework at https://github.com/Ankit152/IMDB-sentiment-analysis.git and https://www.kaggle.com/datasets/talha002/rottentomatoes-400k-review

Abbreviations

ABSA:

Aspect-based sentiment analysis

AI:

Artificial intelligence

BOWs:

Bag-of-words

BNB:

Bernoulli Naive Bayes

CNB:

Complement Naive Bayes

CV:

Cross-validation

DL:

Deep learning

GNB:

Gaussian Naive Bayes

GS:

Grid search

IMDB:

Internet movie database

KNN:

K-Nearest Neighbours

SVM:

Support vector machines

ML:

Machine learning

MNB:

Multinomial Naive Bayes

NB:

Naive Bayes

NLP:

Natural language processing

NLTK:

Natural language tool kit

OP:

Opinion mining

RT:

Rotten Tomatoes

TP:

True Positive

TN:

True Negative

FP:

False Positive

FN:

False Negative

SA:

Sentiment analysis

TF–IDF:

Term Frequency–Inverse Document Frequency

Word2vec:

Word to vector

References

  • Abimanyu AJ, Dwifebri M, Astuti W (2023) Sentiment analysis on movie review from rotten tomatoes using logistic regression and information gain feature selection. Build Inf Technol Sci (BITS) 5(1):162–170

    Google Scholar 

  • Adam NL, Rosli NH, Soh SC (2021) Sentiment analysis on movie review using Naïve Bayes. In: 2021 2nd International conference on artificial intelligence and data sciences (AiDAS), pp 1–6. https://doi.org/10.1109/AiDAS53897.2021.9574419

  • Agrawal T (2021) Introduction to hyperparameters. In: Hyperparameter optimization in machine learning: make your machine learning and deep learning models more efficient, pp 1–8. APRESS: New York

  • Arsyah UI, Pratiwi M, Muhammad A (2024) Twitter sentiment analysis of public space opinions using SVM and TF–IDF methods. Indon J Comput Sci 13(1)

  • Artur M (2021) Review the performance of the bernoulli Naïve Bayes classifier in intrusion detection systems using recursive feature elimination with cross-validated selection of the best number of features. Proc Comput Sci 190:564–570

    Article  Google Scholar 

  • Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature extraction in sentiment analysis. J Basic Appl Sci Res 4(3):181–186

    Google Scholar 

  • Baid P, Gupta A, Chaplot N (2017) Sentiment analysis of movie reviews using machine learning techniques. Int J Comput Appl 179(7):45–49

    Google Scholar 

  • Banik N, Rahman MHH (2018) Evaluation of Naïve Bayes and support vector machines on Bangla textual movie reviews. In: 2018 International conference on Bangla speech and language processing (ICBSLP), pp 1–6. IEEE

  • Başarslan MS, Kayaalp F (2023) MBI-GRUMCONV: a novel multi BI-GRU and multi CNN-based deep learning model for social media sentiment analysis. J Cloud Comput. https://doi.org/10.1186/s13677-022-00386-3

    Article  Google Scholar 

  • Bilal Khan S, Muhammad Arshad SK (2023) Comparative analysis of machine learning models for pdf malware detection: Evaluating different training and testing criteria. J Cyber Secur 5(1), 1–11 https://doi.org/10.32604/jcs.2023.042501

  • Bodapati JD, Veeranjaneyulu N, Shareef SN (2019) Sentiment analysis from movie reviews using LSTMS. Ingénierie des Systèmes d Inf 24(1):125–129

    Article  Google Scholar 

  • Cahyanti FE, AlFaraby S (2020) On the feature extraction for sentiment analysis of movie reviews based on SVM. In: 2020 8th International conference on information and communication technology (ICoICT), pp 1–5, IEEE

  • Danyal MM, Khan SS, Khan M, Ullah S, Mehmood F, Ali I (2024) Proposing sentiment analysis model based on BERT and XLNET for movie reviews. Multimed Tools Appl 1–25

  • Deepa D, Raaji Tamilarasi A (2019) Sentiment analysis using feature extraction and dictionary-based approaches. In: 2019 Third international conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, pp 786–790. https://doi.org/10.1109/I-SMAC47947.2019.9032456

  • Dewi C, Chen R-C, Christanto HJ, Cauteruccio F (2023) Multinomial Naïve Bayes classifier for sentiment analysis of internet movie database. Vietnam J Comput Sci 10(04):485–498

    Article  Google Scholar 

  • Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using Naive Bayes and k-NN classifier. arXiv preprint arXiv:1610.09982

  • Danyal M M, Haseeb M, Khan S S, Khan B, Ullah S (2024) Opinion Mining on Movie Reviews Based on Deep Learning Models. J Artif Intell (6):(2579–0021).

  • Danyal M M, Khan S S, Khan M, Ghaffar M B, Khan B, Arshad, M (2023) Sentiment Analysis Based on Performance of Linear Support Vector Machine and Multinomial Naïve Bayes Using Movie Reviews with Baseline Techniques. J Big Data (5).

  • Horsa OG, Tune KK, et al (2023) Aspect-based sentiment analysis for AFAAN OROMOO movie reviews using machine learning techniques. Appl Comput Intell Soft Comput 2023

  • Jahromi AH, Taheri M (2017) A non-parametric mixture of gaussian Naive Bayes classifiers based on local independent features. In: 2017 Artificial intelligence and signal processing conference (AISP), pp 209–212. IEEE

  • Khan M, Khan M S, Alharbi Y (2020) Text mining challenges and applications—a comprehensive review. IJCSNS 20(12):138

  • Khan SS, Khan M, Ran Q, Naseem R (2018) Challenges in opinion mining, comprehensive. Sci Technol J (Ciencia e Tecnica Vitivinicola) 33(11):123–135

    Google Scholar 

  • Maas AL, Daly R, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 142–150

  • Mall P, Kumar M, Kumar A, Gupta A, Srivastava S, Narayan V, Chauhan AS, Srivastava AP (2024) Self-attentive CNN + BERT: An approach for analysis of sentiment on movie reviews using word embedding. Int J Intell Syst Appl Eng 12(12s):612–623

    Google Scholar 

  • Maulana R, Rahayuningsih PA, Irmayani W, Saputra D, Jayanti WE (2020) Improved accuracy of sentiment analysis movie review using support vector machine based information gain. J Phys Conf Ser 1641:012060

    Article  Google Scholar 

  • Pimpalkar A, Raj RJR (2022) Mbilstmglove: embedding glove knowledge into the corpus using multi-layer Bilstm deep learning model for social media sentiment analysis. Exp Syst Appl 203:117581. https://doi.org/10.1016/j.eswa.2022.117581

    Article  Google Scholar 

  • Rahat AM, Kahir A, Masum AKM (2019) Comparison of Naive Bayes and SVM algorithm based on sentiment analysis using review dataset. In: 2019 8th International conference system modeling and advancement in research trends (SMART), pp 266–270. IEEE

  • Rahman R, Masud MA, Mimi RJ, Dina MNS (2021) Sentiment analysis on Bengali movie reviews using multinomial Naïve Bayes. In: 2021 24th International conference on computer and information technology (ICCIT), pp 1–6. https://doi.org/10.1109/ICCIT54785.2021.9689787

  • Rizal C, Kifta DA, Nasution RH, Rengganis A, Watrianthos R (2023) Opinion classification for IMDB review based using Naive Bayes method. In: AIP conference proceedings, vol 2913. AIP Publishing: New York

  • Rotten Tomatoes Movie Reviews dataset https://www.rottentomatoes.com. Accessed on 02 Mar 2023 (2020)

  • Samsir S, Kusmanto K, Dalimunthe AH, Aditiya R, Watrianthos R (2022) Implementation Naïve Bayes classification for sentiment analysis on internet movie database. Build Inf Technol Sci (BITS) 4(1):1–6

    Google Scholar 

  • Shackley D, Folajimi Y (2023) Sentiment analysis of fake health news using Naive Bayes classification models. Int J Cognit Lang Sci 17(3):217–224

    Google Scholar 

  • Sudha N, Govindarajan M (2016) Mining movie reviews using machine learning techniques. Int J Comput Appl 144(5)

  • Teja JS, Sai GK, Kumar MD, Manikandan R (2018) Sentiment analysis of movie reviews using machine learning algorithms—a survey. Int J Pure Appl Math 118(20):3277–3284

    Google Scholar 

  • Ullah K, Rashad, A, Khan M, Ghadi Y, Aljuaid H, Nawaz Z et al (2022) A deep neural network-based approach for sentiment analysis of movie reviews. Complexity 2022

  • Veziroğlu M, Eziroğlu E, Bucak İ.Ö (2024) Performance comparison between Naive Bayes and machine learning algorithms for news classification. In: Bayesian inference-recent trends. IntechOpen

  • Vielma C, Verma A, Bein D (2023) Sentiment analysis with novel GRU based deep learning networks. In: 2023 IEEE World AI IoT congress (AIIoT), pp 0440–0446. https://doi.org/10.1109/AIIoT58121.2023.10174396

  • Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415(1):295–316

    Article  Google Scholar 

  • Yusran M, Siswanto S, Islamiyati A (2024) Comparison of multinomial Naïve Bayes and Bernoulli Naïve Bayes on sentiment analysis of Kurikulum Merdeka with query expansion ranking. SISTEMASI 13(1):96–106

    Article  Google Scholar 

Download references

Acknowledgements

We sincerely thank everyone who helped us finish this research paper. We are grateful to the participants for their helpful feedback and ideas, which improved our research methods and the quality of our results. We appreciate everyone who gave their time to join our study, as this research wouldn’t have been possible without them. Thank you to everyone who took the time to contribute to this research paper.

Funding

This paper is for free publication.

Author information

Authors and Affiliations

Authors

Contributions

The author contributions are as follow: “Conceptualization, MMD and SSK; methodology, MBG and MK; software, MMD, SU; validation, SSK and WK; formal analysis, MK, WK, and MBG; investigation, SU; data curation, SU and SSK; writing-original draft preparation, MMD, and MBG; writing-review and editing, SSK; visualization, MBG, and MK.

Corresponding author

Correspondence to Muzammil Khan.

Ethics declarations

Conflict of interest

The authors of this paper declare that they do not have any conflicts of interest.

Financial interests

The authors of this paper have no Conflict of interest relevant to this article’s content to declare.

Ethical approval

Not applicable.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Danyal, M.M., Khan, S.S., Khan, M. et al. Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer. Soc. Netw. Anal. Min. 14, 87 (2024). https://doi.org/10.1007/s13278-024-01250-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-024-01250-9

Keywords

Navigation