Abstract
Sentiment analysis (SA), often known as opinion mining, is the subjective examination of a written text. Moreover, SA is a critical technique in today's artificial intelligence (AI) field for extracting emotional information from huge amounts of data. The study is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels that are connected with them. Our research experiment's objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the first and most critical phase in a Natural Language Processing (NLP) system since it significantly impacts the overall accuracy of the classification algorithms. The experiment implements the Term Frequency-Inverse Document Frequency model (TFIDF) to feature selection and extractions. The following classifiers are used in this work: Linear Model and Naïve Bayes. Besides, we explore the possible options of loss functions such as square_hinge, huber, modified_huber, log, epsilon_insensitive, perceptron, and modified_huber. ComplementNB achieves the highest accuracy, 75.13%, for both classification reports based on our experiment result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kumar, S., Gahalawat, M., Roy, P.P., Dogra, D.P., Kim, B.G.: Exploring impact of age and gender on sentiment analysis using machine learning. Electronics. 9(2), 374 (2020). https://doi.org/10.3390/electronics9020374
Kumar, S., Yadava, M., Roy, P.P.: Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction. Inf. Fusion. 52, 41–52 (2019). https://doi.org/10.1016/j.inffus.2018.11.001
Dewi, C., Chen, R.-C.: Human activity recognition based on evolution of features selection and random forest. In: 2019 IEEE International Conference System Man Cybernetics, pp. 2496–2501 (2019). https://doi.org/10.1109/SMC.2019.8913868
Kim, J.H., Kim, B.G., Roy, P.P., Jeong, D.M.: Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access. 7, 41273–41285 (2019). https://doi.org/10.1109/ACCESS.2019.2907327
Manek, A.S., Shenoy, P.D., Mohan, M.C., R, V.K.: Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web 20(2), 135–154 (2016). https://doi.org/10.1007/s11280-015-0381-x
Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts (2014)
Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Androutsopoulos, I.: SemEval-2015 Task 12: Aspect Based Sentiment Analysis (2015). https://doi.org/10.18653/v1/s15-2082
Dewi, C., Chen, R.-C., Yu, H., Jiang, X.: Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient. Intell. Humaniz. Comput. 12, 1–18 (2021). https://doi.org/10.1007/s12652-021-03584-0
Cao, D., Ji, R., Lin, D., Li, S.: A cross-media public sentiment analysis system for microblog. Multimedia Syst. 22(4), 479–486 (2014). https://doi.org/10.1007/s00530-014-0407-8
Ren, R., Wu, D.D., Wu, D.D.: Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Syst. J. 13(1), 760–770 (2019). https://doi.org/10.1109/JSYST.2018.2794462
Shapiro, A.H., Sudhof, M., Wilson, D.J.: Measuring news sentiment. J. Econom. 228, 221–243 (2020). https://doi.org/10.1016/j.jeconom.2020.07.053
Abercrombie, G., Batista-Navarro, R.: ParlVote: a corpus for sentiment analysis of political debates (2020)
Dewi, C., Chen, R.C., Liu, Y.T., Jiang, X., Hartomo, K.D.: Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access 9, 97228–97242 (2021). https://doi.org/10.1109/ACCESS.2021.3094201
Chatterjee, S., Chakrabarti, K., Garain, A., Schwenker, F., Sarkar, R.: JUMRv1: a sentiment analysis dataset for movie recommendation. Appl. Sci. 11(20), 9381 (2021). https://doi.org/10.3390/app11209381
Dewi, C., Chen, R.-C., Liu, Y.-T., Tai, S.-K.: Synthetic Data generation using DCGAN for improved traffic sign recognition. Neural Comput. Appl. 33(3), 1–15 (2021). https://doi.org/10.1007/s00521-021-05982-z
Chen, R.-C., Dewi, C., Zhang, W.-W., Liu, J.-M.: Integrating gesture control board and image recognition for gesture recognition based on deep learning. Int. J. Appl. Sci. Eng. 17(3), 237–248 (2020)
Dewi, C., Chen, R.-C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 81, 37821–37845 (2022). https://doi.org/10.1007/s11042-022-12962-5
Zirn, C., Niepert, M., Strube, M., Stuckenschmidt, H.: Fine-grained sentiment analysis with structural features. In: Proceedings of 5th International Joint Conference National Language Process (2011)
Appel, O., Chiclana, F., Carter, J., Fujita, H.: Successes and challenges in developing a hybrid approach to sentiment analysis. Appl. Intell. 48(5), 1176–1188 (2017). https://doi.org/10.1007/s10489-017-0966-4
Pang, S., Lee, B., Vithyanathan, L.: Thumbs up? Sentiment classification using machine learning techniques. Proc. Inst. Civ. Eng. Transp. 172(2), 1–5 (2019)
Dewi, C., Chen, R.C.: Random forest and support vector machine on features selection for regression analysis. Int. J. Innov. Comput. Inf. Control 15(6), 2027–2037 (2019). https://doi.org/10.24507/ijicic.15.06.2027
Chen, S., Webb, G.I., Liu, L., Ma, X.: A novel selective naïve Bayes algorithm. Knowl. Based Syst. 192, 105361 (2020). https://doi.org/10.1016/j.knosys.2019.105361
Dewi, C., Chen, R., Liu, Y., Yu, H.: Various generative adversarial networks model for synthetic prohibitory sign image generation. Appl. Sci. 11, 2913 (2021)
Lee, C.H., Gutierrez, F., Dou, D.: Calculating feature weights in naive Bayes with Kullback-Leibler measure (2011). https://doi.org/10.1109/ICDM.2011.29
Zaidi, N.A., Cerquides, J., Carman, M.J., Webb, G.I.: Alleviating Naive bayes attribute independence assumption by attribute weighting. J. Mach. Learn. Res. 14, 1947–1988 (2013). https://doi.org/10.13039/501100000923
Dewi, C., Chen, R.-C., Hendry, Hung, H.-T.: Experiment improvement of restricted Boltzmann machine methods for image classification. Vietnam J. Comput. Sci., 8(3), 1–16 (2021). https://doi.org/10.1142/S2196888821500184
Dewi, C., Chen, R.-C., Tai, S.-K.: Evaluation of robust spatial pyramid pooling based on convolutional neural network for traffic sign recognition system. Electronics 9(6), 889 (2020). https://doi.org/10.3390/electronics9060889
Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4
Dewi, C., Chen, R.-C., Liu, Y.-T.: Wasserstein generative adversarial networks for realistic traffic sign image generation. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds.) Intelligent Information and Database Systems. LNCS (LNAI), vol. 12672, pp. 479–493. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73280-6_38
Tai, S., Dewi, C., Chen, R., Liu, Y., Jiang, X., Yu, H.: Deep learning for traffic sign recognition based on spatial pyramid pooling with scale analysis. Appl. Sci. 10(19), 6997 (2020). https://doi.org/10.3390/app10196997
Dewi, C., Chen, R.-C., Yu, H.: Weight analysis for various prohibitory sign detection and recognition using deep learning. Multimed. Tools App. 79(43–44), 32897–32915 (2020). https://doi.org/10.1007/s11042-020-09509-x
Richardson, L.: Beautiful Soup Documentation Release 4.4.0 (2019)
Lakshmipathi, N.: IMDB Dataset of 50K Movie Reviews. Kaggle (2019)
Dew, C., Chen, R.C., Liu, Y.-T.: Taiwan stop sign recognition with customize anchor. In: ICCMS 2020, February 26–28, 2020 Brisbane QLD, pp. 51–55, Australia (2020)
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings, Twentieth International Conference on Machine Learning, vol. 2 (2003)
Tessem, B., Bjørnestad, S., Chen, W., Nyre, L.: Word cloud visualisation of locative information. J. Locat. Based Serv. 9(4), 254–272 (2015). https://doi.org/10.1080/17489725.2015.1118566
Acknowledgment
This paper is supported by the Ministry of Science and Technology, Taiwan. The Nos are MOST-107-2221-E-324-018-MY2 and MOST-109-2622-E-324-004, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dewi, C., Chen, RC. (2022). Complement Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13757. Springer, Cham. https://doi.org/10.1007/978-3-031-21743-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-21743-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21742-5
Online ISBN: 978-3-031-21743-2
eBook Packages: Computer ScienceComputer Science (R0)