Skip to main content
Log in

Machine learning-based new approach to films review

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The main purpose of Sentiment Analysis (SA) is to derive useful insights from large amounts of unstructured data compiled from various sources. This analysis helps to interpret and classify textual data using different techniques applied in machine learning (ML) models. In this paper, we compared simple and ensemble ML methods as classifiers for SA: Random Forest, K-Nearest Neighbor, Artificial Neural Network, Gradient Boosting, Support Vector Machine (SVM), AdaBoost, Extreme Gradient Boosting, Decision Tree, Light GBM, Stochastic Gradient Descent and Bagging. For this, we considered a test set database of 50,000 movie reviews, of which 25,000 were rated positive and 25,000 negatives. We have chosen 20,000 words that have an impact on the feelings of the documents. This work aims to propose a new rating prediction approach based on a textual customer review. We consider term frequency characteristics and term frequency-inverse document frequency from the large-scale and serial trials to compare the results obtained by various classifiers using feature extraction techniques. For the decision phase, we applied the Fuzzy Decision by Opinion Score Method, one of the most recent methods for multi-criteria decision-making. To evaluate and quantify the performance of the different ML methods we considered, we apply six standard measures namely precision, accuracy, recall, F-score, AUC, and Kappa-measure. The results we obtained, at the end of the experimental work that we conducted, indicated that the SVM classier is the best with 88,333% as a precision rate followed by the FDOSM method, with 0.800 for the same measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. www.kaggle.com.

References

  • Ababneh J (2019) Application of Naïve Bayes, decision tree, and k-nearest neighbors for automated text classification. Mod Appl Sci 13(11):31

    Article  Google Scholar 

  • Ahmed MA, Al-Qaysi ZT, Shuwandy ML, Salih MM, Ali MH (2021) Automatic COVID-19 pneumonia diagnosis from X-ray lung image: a deep feature and machine learning solution. J Phys Conf Ser 1963:012099

    Article  Google Scholar 

  • Albahri OS, Zaidan AA, Salih MM, Zaidan BB, Khatari MA, Ahmed MA, Albahri AS, Alazab M (2020) Multidimensional benchmarking of the active queue management methods of network congestion control based on extension of fuzzy decision by opinion score method. Int J Intell Syst 36(2):796–831

    Article  Google Scholar 

  • Albahri OS, Zaidan AA, Salih MM, Zaidan BB, Khatari MA, Ahmed MA, Albahri AS, Alazab M (2021) Multidimensional benchmarking of the active queue management methods of network congestion control based on extension of fuzzy decision by opinion score method. Int J Intell Syst 36(2):796–831

    Article  Google Scholar 

  • Albahri AS, Albahri OS, Zaidan AA, Alnoor A, Alsattar HA, Mohammed R, Alamoodi AH, Zaidan BB, Aickelin U, Alazab M et al (2022) Integration of fuzzy-weighted zero-inconsistency and fuzzy decision by opinion score methods under a q-rung orthopair environment: a distribution case study of COVID-19 vaccine doses. Comput Stand Interfaces 80:103572

    Article  Google Scholar 

  • Al-Qaysi ZT, Ahmed MA, Hammash NM, Hussein AF, Albahri AS, Suzani MS, Al-Bander B (2022) A systematic rank of smart training environment applications with motor imagery brain-computer interface. Multimedia Tools Appl

  • Al-Samarraay MS, Salih MM, Ahmed MA, Zaidan AA, Albahri OS, Pamucar D, AlSattar HA, Alamoodi AH, Zaidan BB, Dawood K et al (2022) A new extension of FDOSM based on pythagorean fuzzy environment for evaluating and benchmarking sign language recognition systems. Neural Comput Appl 34(6):4937–4955

    Article  Google Scholar 

  • Al-Samarraay MS, Zaidan AA, Albahri OS, Pamucar D, AlSattar HA, Alamoodi AH, Zaidan BB, Albahri AS (2022) Extension of interval-valued pythagorean FDOSM for evaluating and benchmarking real-time SLRSS based on multidimensional criteria of hand gesture recognition and sensor glove perspectives. Appl Soft Comput 116:108284

    Article  Google Scholar 

  • Behzadian M, Otaghsara SK, Yazdani M, Ignatius J (2012) A state-of the-art survey of TOPSIS applications. Expert Syst Appl 39(17):13051–13069

    Article  Google Scholar 

  • Bennett S (2016) Predicting elections with twitter: what 140 characters reveal about political sentiment

  • Cahyanti FE, Adiwijaya FSA (2020) On the feature extraction for sentiment analysis of movie reviews based on SVM. In: 2020 8th international conference on information and communication technology (ICoICT). IEEE

  • Campanella G, Ribeiro RA (2011) A framework for dynamic multiple-criteria decision making. Decis Support Syst 52(1):52–60

    Article  Google Scholar 

  • Cano AE, Preotiuc-Pietro D, Radovanović D, Weller K, Dadzie A-S (2016) #microposts2016. In: Proceedings of the 25th international conference companion on world wide web—WWW’16 companion. ACM Press

  • Çelen A (2014) Comparative analysis of normalization procedures in TOPSIS method: with an application to Turkish deposit banking market. Informatica 25(2):185–208

    Article  MathSciNet  Google Scholar 

  • Fadhli I, Hlaoua L, Omri MN(2022) Sentiment analysis CSAM model to discover pertinent conversations in twitter microblogs. I. J Comput Netw Inf Secur 28–46

  • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  • Gammoudi F, Sendi M, Omri MN (2022) A survey on social media influence environment and influencers identification. Soc Netw Anal Min 12(1):1–19

    Article  Google Scholar 

  • Garfan S, Alamoodi AH, Zaidan BB, Al-Zobbi M, Hamid RA, Alwan JK, Ahmaro IYY, Khalid ET, Jumaah FM, Albahri OS et al (2021) Telehealth utilization during the COVID-19 pandemic: a systematic review. Comput Biol Med 138:104878

    Article  Google Scholar 

  • Grinsztajn L, Oyallon E, Varoquaux G (2022) Why do tree-based models still outperform deep learning on tabular data? arXiv preprint arXiv:2207.08815

  • Haddad O, Fkih F, Omri MN (2022) Machine learning analytics-based distributed frameworks: a survey

  • Hasebrook N, Morsbach F, Kannengießer N, Franke J, Hutter F, Sunyaev A (2022) Why do machine learning practitioners still use manual tuning? A qualitative study. arXiv preprint arXiv:2203.01717

  • Hossain MdI, Rahman M, Ahmed T, Islam AZMT (2021) Forecast the rating of online products from customer text review based on machine learning algorithms. In: 2021 international conference on information and communication technology for sustainable development (ICICT4SD). IEEE, pp 6–10

  • Hudson S, Huang L, Roth MS, Madden TJ (2016) The influence of social media interactions on consumer-brand relationships: a three-country study of brand perceptions and marketing behaviors. Int J Res Market 33(1):27–41

    Article  Google Scholar 

  • Hutto C, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the international AAAI conference on web and social media, vol 8(1), pp 216–225

  • Jannach D, Moreira G de Souza P, Oldridge E (2020) Why are deep learning models not consistently winning recommender systems competitions yet? A position paper. In: Proceedings of the recommender systems challenge 2020, pp 44–49

  • Jannik K, Neil B, Clare L, Aidan NG, Thomas R, Yarin G (2021) Self-attention between datapoints: going beyond individual input-output pairs in deep learning. Adv Neural Inf Process Syst 34:28742–28756

    Google Scholar 

  • Japhne A, Murugeswari R (2020) Opinion mining based complex polarity shift pattern handling for improved sentiment classification. In: 2020 international conference on inventive computation technologies (ICICT). IEEE

  • Jassim MA (2021) Analysis of the performance of the main algorithms for educational data mining: a review. In: IOP conference series: materials science and engineering. IOP Publishing, vol 1090, p 012084

  • Kabir M, Jahangir MM, Kabir SX, Badhon B (2021) An empirical research on sentiment analysis using machine learning approaches. Int J Comput Appl 43(10):1011–1019

    Google Scholar 

  • Kaur J, Saini JR (2017) Punjabi poetry classification: the test of 10 machine learning algorithms. In: Proceedings of the 9th international conference on machine learning and computing, pp 1–5

  • Khan FH, Qamar U, Bashir S (2016) A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet. Knowl Inf Syst 51(3):851–872

    Article  Google Scholar 

  • Kiritchenko S, Mohammad SM (2016) Sentiment composition of words with opposing polarities. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics

  • Kornyshova E, Salinesi C (2007) MCDM techniques selection approaches: state of the art. In: 2007 IEEE symposium on computational intelligence in multi-criteria decision-making. IEEE, pp 22–29

  • Kumar RS, Saviour DAF, Rajeswari M, Julie EG, Robinson YH, Shanmuganathan V (2021) Exploration of sentiment analysis and legitimate artistry for opinion mining. Multimedia Tools Appl 1–16

  • Larsen P, Von Ins M (2010) The rate of growth in scientific publication and the decline in coverage provided by science citation index. Scientometrics 84(3):575–603

    Article  Google Scholar 

  • Li Z, Fan Y, Jiang B, Lei T, Liu W (2019) A survey on sentiment analysis and opinion mining for social multimedia. Multimedia Tools Appl 78(6):6939–6967

    Article  Google Scholar 

  • Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  MathSciNet  Google Scholar 

  • Liu Y, Huang X, An A, Yu X (2007) ARSA: a sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 607–614

  • Machado MR, Karray S, de Sousa IT(2019) LightGBM: an effective decision tree gradient boosting method to predict customer loyalty in the finance industry. In: 2019 14th international conference on computer science education (ICCSE). IEEE

  • Mahdavi I, Mahdavi-Amiri N, Heidarzade A, Nourifar R (2008) Designing a model of fuzzy TOPSIS in multiple criteria decision making. Appl Math Comput 206(2):607–617

    MathSciNet  Google Scholar 

  • Mahjouri M, Ishak MB, Torabian A, Abd ML, Halimoon N, Ghoddusi J (2017) Optimal selection of iron and steel wastewater treatment technology using integrated multi-criteria decision-making techniques and fuzzy logic. Process Saf Environ Protect 107:54–68

    Article  Google Scholar 

  • Mahmoud US, Albahri AS, AlSattar HA, Zaidan AA, Talal M, Mohammed RA, Albahri OS, Zaidan BB, Alamoodi AH, Hadi SM (2021) A methodology of DASS benchmarking to support industrial community characteristics in designing and implementing advanced driver assistance systems within vehicles

  • Malek YA, Alexander G, Abdul RSF (2018) Selection of alternatives using fuzzy networks with rule base aggregation. Fuzzy Sets Syst 341:123–144

    Article  MathSciNet  Google Scholar 

  • Mamun MdMR, Sharif O, Mohammed MH (2021) Classification of textual sentiment using ensemble technique. SN Comput Sci 3(1):521

    Google Scholar 

  • Mäntylä MV, Graziotin D, Kuutila M (2018) The evolution of sentiment analysis-a review of research topics, venues, and top cited papers. Comput Sci Rev 27:16–32

    Article  Google Scholar 

  • Mtetwa N, Awukam AO, Yousefi M (2018) Feature extraction and classification of movie reviews. In: 2018 5th international conference on soft computing machine intelligence (ISCMI). IEEE

  • Mustafa AJ (2018) Performance analysis of a keyword search system. J Univ Babylon Eng Sci 26(3):146–152

    MathSciNet  Google Scholar 

  • Nakov P (2016) Sentiment analysis in twitter: a SemEval perspective. In: Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis. Association for Computational Linguistics

  • Namugera F, Wesonga R, Jehopio P (2019) Text mining and determinants of sentiments: Twitter social media usage by traditional media houses in Uganda. Comput Soc Netw 6(1)

  • O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: Linking text sentiment to public opinion time series. In: Fourth international AAAI conference on weblogs and social media

  • Opricovic S, Tzeng G-H (2004) Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS. Eur J Oper Res 156(2):445–455

    Article  Google Scholar 

  • Ouni S, Fkih F, Omri MN (2022) BERT-and CNN-based Tobeat approach for unwelcome tweets detection. Soc Netw Anal Min 12(1):1–19

    Article  Google Scholar 

  • Ouni S, Fkih F, Omri MN (2022) Novel semantic and statistic features-based author profiling approach. J Amb Intell Hum Comput 1–17

  • Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. arXiv preprint arXiv:cs/0409058

  • Patel NV, Chhinkaniwala H (2022) Investigating machine learning techniques for user sentiment analysis. In: Research anthology on machine learning techniques, methods, and applications. IGI Global, pp 681–692

  • Qi Z (2020) The text classification of theft crime based on TF-IDF and XGBoost model. In: 2020 IEEE international conference on artificial intelligence and computer applications (ICAICA). IEEE

  • Rafay A, Suleman M, Alim A (2020) Robust review rating prediction model based on machine and deep learning: Yelp dataset. In: 2020 international conference on emerging trends in smart technologies (ICETST). IEEE

  • Roy D, Dutta M (2022) Optimal hierarchical attention network-based sentiment analysis for movie recommendation. Soc Netw Anal Min 12(1):138

    Article  Google Scholar 

  • Salih MM, Zaidan BB, Zaidan AA (2020) Fuzzy decision by opinion score method. Appl Soft Comput 96:106595

    Article  Google Scholar 

  • Salminen J, Yoganathan V, Corporan J, Jansen BJ, Jung S-G (2019) Machine learning approach to auto-tagging online content for content marketing efficiency: a comparative analysis between methods and content type. J Bus Res 101:203–217

    Article  Google Scholar 

  • Sarawgi K, Pathak V (2017) Opinion mining: aspect level sentiment analysis using SentiWordNet and amazon web services. Int J Comput Appl 158(6):31–36

    Google Scholar 

  • Sharma S, Srivastava S, Kumar A, Dangi A (2018) Multi-class sentiment analysis comparison using support vector machine (SVM) and BAGGING technique-an ensemble method. In: 2018 international conference on smart computing and electronic enterprise (ICSCEE). IEEE

  • Shaukat Z, Zulfiqar AA, Xiao C, Azeem M, Mahmood T (2020) Sentiment analysis on IMDB using lexicon and neural networks. SN App Sci 2(2):1–10

    Google Scholar 

  • Singh RK, Benyoucef L (2011) A fuzzy TOPSIS based approach for e-sourcing. Eng Appl Artif Intell 24(3):437–448

    Article  Google Scholar 

  • Tripathy A, Anand A, Kadyan V (2022) Sentiment classification of movie reviews using GA and NEUROGA. Multimedia Tools Appl 1–21

  • Yang W, Fu Y, Zhang D (2016) An improved parallel algorithm for text categorization. In: 2016 international symposium on computer, consumer and control (IS3C). IEEE

  • Yano T, Smith NA (2010) What’s worthy of comment? Content and comment volume in political blogs. In: Fourth international AAAI conference on weblogs and social media

  • Yan B, Yang Z, Ren Y, Tan X, Liu E (2017) Microblog sentiment classification using parallel SVM in apache spark. In: 2017 IEEE international congress on big data (BigData Congress). IEEE

  • Zaidan AA, Zaidan BB, Hussain M, Haiqi A, Kiah MLM, Abdulnabi M (2015) Multi-criteria analysis for OS-EMR software selection problem: a comparative study. Decis Support Syst 78:15–27

    Article  Google Scholar 

  • Zughoul O, Zaidan AA, Zaidan BB, Albahri OS, Alazab M, Amomeni U, Albahri AS, Salih MM, Mohammed RT, Mohammed KI et al (2021) Novel triplex procedure for ranking the ability of software engineering students based on two levels of AHP and group TOPSIS techniques. Int J Inf Technol Decis Mak 20(01):67–135

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

MAA, DHA, and MNO contributed to the study’s conception and design. The first draft of the manuscript was written by all the authors. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mustafa Abdalrassual Jassim.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jassim, M.A., Abd, D.H. & Omri, M.N. Machine learning-based new approach to films review. Soc. Netw. Anal. Min. 13, 40 (2023). https://doi.org/10.1007/s13278-023-01042-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-023-01042-7

Keywords

Navigation