Complement Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database

Dewi, Christine; Chen, Rung-Ching

doi:10.1007/978-3-031-21743-2_7

Christine Dewi^13,14 &
Rung-Ching Chen¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13757))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1116 Accesses
7 Citations

Abstract

Sentiment analysis (SA), often known as opinion mining, is the subjective examination of a written text. Moreover, SA is a critical technique in today's artificial intelligence (AI) field for extracting emotional information from huge amounts of data. The study is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels that are connected with them. Our research experiment's objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the first and most critical phase in a Natural Language Processing (NLP) system since it significantly impacts the overall accuracy of the classification algorithms. The experiment implements the Term Frequency-Inverse Document Frequency model (TFIDF) to feature selection and extractions. The following classifiers are used in this work: Linear Model and Naïve Bayes. Besides, we explore the possible options of loss functions such as square_hinge, huber, modified_huber, log, epsilon_insensitive, perceptron, and modified_huber. ComplementNB achieves the highest accuracy, 75.13%, for both classification reports based on our experiment result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine learning-based opinion extraction approach from movie reviews for sentiment analysis

Article 10 July 2024

Sentiment analysis of movie reviews based on NB approaches using TF–IDF and count vectorizer

Article 16 April 2024

A Comparison of Sentiment Analysis Techniques on Movie Reviews

References

Kumar, S., Gahalawat, M., Roy, P.P., Dogra, D.P., Kim, B.G.: Exploring impact of age and gender on sentiment analysis using machine learning. Electronics. 9(2), 374 (2020). https://doi.org/10.3390/electronics9020374
Kumar, S., Yadava, M., Roy, P.P.: Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction. Inf. Fusion. 52, 41–52 (2019). https://doi.org/10.1016/j.inffus.2018.11.001
Dewi, C., Chen, R.-C.: Human activity recognition based on evolution of features selection and random forest. In: 2019 IEEE International Conference System Man Cybernetics, pp. 2496–2501 (2019). https://doi.org/10.1109/SMC.2019.8913868
Kim, J.H., Kim, B.G., Roy, P.P., Jeong, D.M.: Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access. 7, 41273–41285 (2019). https://doi.org/10.1109/ACCESS.2019.2907327
Manek, A.S., Shenoy, P.D., Mohan, M.C., R, V.K.: Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web 20(2), 135–154 (2016). https://doi.org/10.1007/s11280-015-0381-x
Article Google Scholar
Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts (2014)
Google Scholar
Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Androutsopoulos, I.: SemEval-2015 Task 12: Aspect Based Sentiment Analysis (2015). https://doi.org/10.18653/v1/s15-2082
Dewi, C., Chen, R.-C., Yu, H., Jiang, X.: Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient. Intell. Humaniz. Comput. 12, 1–18 (2021). https://doi.org/10.1007/s12652-021-03584-0
Article Google Scholar
Cao, D., Ji, R., Lin, D., Li, S.: A cross-media public sentiment analysis system for microblog. Multimedia Syst. 22(4), 479–486 (2014). https://doi.org/10.1007/s00530-014-0407-8
Article Google Scholar
Ren, R., Wu, D.D., Wu, D.D.: Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Syst. J. 13(1), 760–770 (2019). https://doi.org/10.1109/JSYST.2018.2794462
Shapiro, A.H., Sudhof, M., Wilson, D.J.: Measuring news sentiment. J. Econom. 228, 221–243 (2020). https://doi.org/10.1016/j.jeconom.2020.07.053
Article MathSciNet MATH Google Scholar
Abercrombie, G., Batista-Navarro, R.: ParlVote: a corpus for sentiment analysis of political debates (2020)
Google Scholar
Dewi, C., Chen, R.C., Liu, Y.T., Jiang, X., Hartomo, K.D.: Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access 9, 97228–97242 (2021). https://doi.org/10.1109/ACCESS.2021.3094201
Article Google Scholar
Chatterjee, S., Chakrabarti, K., Garain, A., Schwenker, F., Sarkar, R.: JUMRv1: a sentiment analysis dataset for movie recommendation. Appl. Sci. 11(20), 9381 (2021). https://doi.org/10.3390/app11209381
Dewi, C., Chen, R.-C., Liu, Y.-T., Tai, S.-K.: Synthetic Data generation using DCGAN for improved traffic sign recognition. Neural Comput. Appl. 33(3), 1–15 (2021). https://doi.org/10.1007/s00521-021-05982-z
Article Google Scholar
Chen, R.-C., Dewi, C., Zhang, W.-W., Liu, J.-M.: Integrating gesture control board and image recognition for gesture recognition based on deep learning. Int. J. Appl. Sci. Eng. 17(3), 237–248 (2020)
Google Scholar
Dewi, C., Chen, R.-C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 81, 37821–37845 (2022). https://doi.org/10.1007/s11042-022-12962-5
Article Google Scholar
Zirn, C., Niepert, M., Strube, M., Stuckenschmidt, H.: Fine-grained sentiment analysis with structural features. In: Proceedings of 5th International Joint Conference National Language Process (2011)
Google Scholar
Appel, O., Chiclana, F., Carter, J., Fujita, H.: Successes and challenges in developing a hybrid approach to sentiment analysis. Appl. Intell. 48(5), 1176–1188 (2017). https://doi.org/10.1007/s10489-017-0966-4
Article Google Scholar
Pang, S., Lee, B., Vithyanathan, L.: Thumbs up? Sentiment classification using machine learning techniques. Proc. Inst. Civ. Eng. Transp. 172(2), 1–5 (2019)
Google Scholar
Dewi, C., Chen, R.C.: Random forest and support vector machine on features selection for regression analysis. Int. J. Innov. Comput. Inf. Control 15(6), 2027–2037 (2019). https://doi.org/10.24507/ijicic.15.06.2027
Article Google Scholar
Chen, S., Webb, G.I., Liu, L., Ma, X.: A novel selective naïve Bayes algorithm. Knowl. Based Syst. 192, 105361 (2020). https://doi.org/10.1016/j.knosys.2019.105361
Dewi, C., Chen, R., Liu, Y., Yu, H.: Various generative adversarial networks model for synthetic prohibitory sign image generation. Appl. Sci. 11, 2913 (2021)
Article Google Scholar
Lee, C.H., Gutierrez, F., Dou, D.: Calculating feature weights in naive Bayes with Kullback-Leibler measure (2011). https://doi.org/10.1109/ICDM.2011.29
Zaidi, N.A., Cerquides, J., Carman, M.J., Webb, G.I.: Alleviating Naive bayes attribute independence assumption by attribute weighting. J. Mach. Learn. Res. 14, 1947–1988 (2013). https://doi.org/10.13039/501100000923
Article MathSciNet MATH Google Scholar
Dewi, C., Chen, R.-C., Hendry, Hung, H.-T.: Experiment improvement of restricted Boltzmann machine methods for image classification. Vietnam J. Comput. Sci., 8(3), 1–16 (2021). https://doi.org/10.1142/S2196888821500184
Dewi, C., Chen, R.-C., Tai, S.-K.: Evaluation of robust spatial pyramid pooling based on convolutional neural network for traffic sign recognition system. Electronics 9(6), 889 (2020). https://doi.org/10.3390/electronics9060889
Article Google Scholar
Chen, R.-C., Dewi, C., Huang, S.-W., Caraka, R.E.: Selecting critical features for data classification based on machine learning methods. J. Big Data 7(1), 1–26 (2020). https://doi.org/10.1186/s40537-020-00327-4
Article Google Scholar
Dewi, C., Chen, R.-C., Liu, Y.-T.: Wasserstein generative adversarial networks for realistic traffic sign image generation. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds.) Intelligent Information and Database Systems. LNCS (LNAI), vol. 12672, pp. 479–493. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73280-6_38
Chapter Google Scholar
Tai, S., Dewi, C., Chen, R., Liu, Y., Jiang, X., Yu, H.: Deep learning for traffic sign recognition based on spatial pyramid pooling with scale analysis. Appl. Sci. 10(19), 6997 (2020). https://doi.org/10.3390/app10196997
Article Google Scholar
Dewi, C., Chen, R.-C., Yu, H.: Weight analysis for various prohibitory sign detection and recognition using deep learning. Multimed. Tools App. 79(43–44), 32897–32915 (2020). https://doi.org/10.1007/s11042-020-09509-x
Article Google Scholar
Richardson, L.: Beautiful Soup Documentation Release 4.4.0 (2019)
Google Scholar
Lakshmipathi, N.: IMDB Dataset of 50K Movie Reviews. Kaggle (2019)
Google Scholar
Dew, C., Chen, R.C., Liu, Y.-T.: Taiwan stop sign recognition with customize anchor. In: ICCMS 2020, February 26–28, 2020 Brisbane QLD, pp. 51–55, Australia (2020)
Google Scholar
Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings, Twentieth International Conference on Machine Learning, vol. 2 (2003)
Google Scholar
Tessem, B., Bjørnestad, S., Chen, W., Nyre, L.: Word cloud visualisation of locative information. J. Locat. Based Serv. 9(4), 254–272 (2015). https://doi.org/10.1080/17489725.2015.1118566

Download references

Acknowledgment

This paper is supported by the Ministry of Science and Technology, Taiwan. The Nos are MOST-107-2221-E-324-018-MY2 and MOST-109-2622-E-324-004, Taiwan.

Author information

Authors and Affiliations

Department of Information Management, Chaoyang University of Technology, Taichung, Republic of China
Christine Dewi & Rung-Ching Chen
Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Indonesia
Christine Dewi

Authors

Christine Dewi
View author publications
You can also search for this author in PubMed Google Scholar
Rung-Ching Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rung-Ching Chen .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Vietnam National University, Ho Chi Minh City, Ho Chi Minh City, Vietnam
Tien Khoa Tran
Al-Farabi Kazakh National University, Almaty, Kazakhstan
Ualsher Tukayev
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński
University of Newcastle, Newcastle, NSW, Australia
Edward Szczerbicki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dewi, C., Chen, RC. (2022). Complement Naive Bayes Classifier for Sentiment Analysis of Internet Movie Database. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13757. Springer, Cham. https://doi.org/10.1007/978-3-031-21743-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-21743-2_7
Published: 09 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21742-5
Online ISBN: 978-3-031-21743-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics