Skip to main content

A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification

  • Conference paper
  • First Online:
Big Data Analytics in Astronomy, Science, and Engineering (BDA 2023)

Abstract

In the modern era, we find ourselves immersed in an ever-expanding flow of data where data is increasing exponentially. Data is generated from different platforms like Education, Business, E-commerce, and predominantly, social media platforms such as Twitter, YouTube, Facebook, and Instagram. Amidst this proliferation of content, user comments have emerged as a crucial element, serving as a platform for expressions of opinions, commendations, and critiques. However, within the abundance of user feedback lies a persistent issue: the presence of undesirable comments that elicit negative emotional responses and prove to be tedious and irrelevant. Effectively identifying and removing such comments poses a major challenge. This research addresses the imperative need for a robust comment classification model. To tackle this issue, a comprehensive investigation is conducted, employing a variety of machine learning models, including Decision Trees, Random Forests (RF), Naive Bayes, K-Nearest Neighbors, Gradient Boosting, AdaBoost, Logistic Regression, and Support Vector Machines (SVM) for comment classification. Furthermore, fundamental voting techniques such as Hard-Voting, Averaging, and Soft-Voting are incorporated with machine learning models to improve the classification performance. The objective is to discern the characteristics of text comments, classifying them, with the aim of achieving superior accuracy compared to prior research. In this paper, we propose a robust ensemble model, RF+AdaBoost+SVM+Soft-Voting, specifically designed for comment classification. The results obtained indicate that the proposed ensemble model achieved an impressive accuracy of approximately 98% for comment classification on YouTube dataset.

A. I. Shiplu and M. M. Rahman—Contributed Equally to this Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abubakar, U.B.U.: A comparison analysis of twitter based support vector machine and Bayes comment classification algorithms. Artif. Comput. Intell. (2020)

    Google Scholar 

  2. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)

    Article  Google Scholar 

  3. Alzoubi, Y.I., Topcu, A.E., Erkaya, A.E.: Machine learning-based text classification comparison: Turkish language context. Appl. Sci. 13(16), 9428 (2023)

    Article  Google Scholar 

  4. Alzubi, J., Nayyar, A., Kumar, A.: Machine learning from theory to algorithms: an overview. In: Journal of Physics: Conference Series, vol. 1142, p. 012012. IOP Publishing (2018)

    Google Scholar 

  5. Aral, S., Eckles, D.: Protecting elections from social media manipulation. Science 365(6456), 858–861 (2019)

    Article  Google Scholar 

  6. Asthana, P., Hazela, B.: Applications of machine learning in improving learning environment. In: Tanwar, S., Tyagi, S., Kumar, N. (eds.) Multimedia Big Data Computing for IoT Applications. Intelligent Systems Reference Library, vol. 163, pp. 417–433. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8759-3_16

    Chapter  Google Scholar 

  7. Belcastro, L., Branda, F., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Analyzing voter behavior on social media during the 2020 us presidential election campaign. Soc. Netw. Anal. Min. 12(1), 83 (2022)

    Article  Google Scholar 

  8. Bi, Q., Goodman, K.E., Kaminsky, J., Lessler, J.: What is machine learning? A primer for the epidemiologist. Am. J. Epidemiol. 188(12), 2222–2239 (2019)

    Google Scholar 

  9. Carbonell, J.G., Michalski, R.S., Mitchell, T.M.: An overview of machine learning. Mach. Learn., 3–23 (1983)

    Google Scholar 

  10. Dietterich, T.G., et al.: Ensemble learning. Handb. Brain Theory Neural Netw. 2(1), 110–125 (2002)

    Google Scholar 

  11. El Naqa, I., Murphy, M.J.: What is Machine Learning? Springer, Cham (2015)

    Google Scholar 

  12. Flach, P., Kull, M.: Precision-recall-gain curves: PR analysis done right. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  13. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist., 1189–1232 (2001)

    Google Scholar 

  14. Gandhi, I., Pandey, M.: Hybrid ensemble of classifiers using voting. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), pp. 399–404. IEEE (2015)

    Google Scholar 

  15. González Canché, M.S.: Latent code identification (LACOID): a machine learning-based integrative framework [and open-source software] to classify big textual data, rebuild contextualized/unaltered meanings, and avoid aggregation bias. Int J Qual Methods 22, 16094069221144940 (2023)

    Article  Google Scholar 

  16. Gudivada, V.N., Rao, C.R.: Computational analysis and understanding of natural languages: principles, methods and applications. (No Title) (2018)

    Google Scholar 

  17. Halibas, A.S., Shaffi, A.S., Mohamed, M.A.K.V.: Application of text classification and clustering of twitter data for business analytics. In: 2018 Majan International Conference (MIC), pp. 1–7. IEEE (2018)

    Google Scholar 

  18. Han, H., Jiang, X.: Overcome support vector machine diagnosis overfitting. Cancer Inform. 13, CIN–S13875 (2014)

    Google Scholar 

  19. Helm, J.M., et al.: Machine learning and artificial intelligence: definitions, applications, and future directions. Curr. Rev. Musculoskelet. Med. 13, 69–76 (2020)

    Article  Google Scholar 

  20. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)

    Article  MathSciNet  Google Scholar 

  21. Joseph, A.M., et al.: COVID-19 misinformation on social media: a scoping review. Cureus 14(4) (2022)

    Google Scholar 

  22. Joyce, J.: Bayes’ theorem (2003)

    Google Scholar 

  23. Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021)

    Google Scholar 

  24. Madden, A., Ruthven, I., McMenemy, D.: A classification scheme for content analyses of Youtube video comments. J. Document. 69(5), 693–714 (2013)

    Article  Google Scholar 

  25. Maimon, O.Z., Rokach, L.: Data mining with decision trees: theory and applications, vol. 81. World scientific (2014)

    Google Scholar 

  26. Maiya, A.S.: ktrain: a low-code library for augmented machine learning. J. Mach. Learn. Res. 23(1), 7070–7075 (2022)

    MathSciNet  Google Scholar 

  27. Mehmood, A., On, B.W., Lee, I., Ashraf, I., Sang Choi, G.: Spam comments prediction using stacking with ensemble learning. In: Journal of Physics: Conference Series, vol. 933, p. 012012. IOP Publishing (2018)

    Google Scholar 

  28. Mienye, I.D., Sun, Y.: A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10, 99129–99149 (2022). https://doi.org/10.1109/ACCESS.2022.3207287

    Article  Google Scholar 

  29. Oh, H.: A Youtube spam comments detection scheme using cascaded ensemble machine learning model. IEEE Access 9, 144121–144128 (2021)

    Article  Google Scholar 

  30. Othman, N.F., Din, W.: Youtube spam detection framework using naïve bayes and logistic regression. Indonesian J. Electr. Eng. Comput. Sci. 14(3), 1508–1517 (2019)

    Article  Google Scholar 

  31. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)

    Article  Google Scholar 

  32. Patel, P., Mistry, K.: A review: text classification on social media data. IOSR J. Comput. Eng. 17(1), 80–84 (2015)

    Google Scholar 

  33. Polikar, R.: Ensemble learning. Ensemble machine learning: methods and applications, pp. 1–34 (2012)

    Google Scholar 

  34. Prasad, G., et al.: Sentiment analysis on cryptocurrency using Youtube comments. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), pp. 730–733. IEEE (2022)

    Google Scholar 

  35. Rahman, M.M., Watanobe, Y., Nakamura, K.: Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Appl. Sci. 10(8), 2973 (2020)

    Article  Google Scholar 

  36. Rahman, M.M., Watanobe, Y., Nakamura, K.: A bidirectional LSTM language model for code evaluation and repair. Symmetry 13(2), 247 (2021)

    Article  Google Scholar 

  37. Raza, K.: Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. In: U-Healthcare Monitoring Systems, pp. 179–196. Elsevier (2019)

    Google Scholar 

  38. Reyana, A., Kautish, S., Karthik, P.S., Al-Baltah, I.A., Jasser, M.B., Mohamed, A.W.: Accelerating crop yield: multisensor data fusion and machine learning for agriculture text classification. IEEE Access 11, 20795–20805 (2023)

    Article  Google Scholar 

  39. Rodrigues, A.P., et al.: Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques. Comput. Intell. Neurosci. 2022 (2022)

    Google Scholar 

  40. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 80–91 (1998)

    Google Scholar 

  41. Sharma, G., Jurie, F., Pérez, P.: Learning non-linear SVM in input space for image classification. Ph.D. thesis, GREYC CNRS UMR 6072, Universite de Caen (2014)

    Google Scholar 

  42. Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning, pp. 20827–20840. PMLR (2022)

    Google Scholar 

  43. Tang, W., Tang, M., Ban, M., Zhao, Z., Feng, M.: CSGVD: a deep learning approach combining sequence and graph embedding for source code vulnerability detection. J. Syst. Softw. 199, 111623 (2023)

    Article  Google Scholar 

  44. Tani, F.Y., Farid, D.M., Zahidur, M.: Ensemble of decision tree classifiers for mining web data streams. Commun. Appl. Electron. 1(1), 26–32 (2014)

    Google Scholar 

  45. Tufekci, Z.: Youtube, the great radicalizer. N.Y. Times 10(3), 2018 (2018)

    Google Scholar 

  46. Watanobe, Y., Rahman, M.M., Amin, M.F.I., Kabir, R.: Identifying algorithm in program code based on structural features using CNN classification model. Appl. Intell. 53(10), 12210–12236 (2023)

    Article  Google Scholar 

  47. Wattenhofer, M., Wattenhofer, R., Zhu, Z.: The youtube social network. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 6, pp. 354–361 (2012)

    Google Scholar 

  48. Yeturu, K.: Machine learning algorithms, applications, and practices in data science. In: Handbook of Statistics, vol. 43, pp. 81–206. Elsevier (2020)

    Google Scholar 

  49. Yıldırım, F.M., Kaya, A., Öztürk, S.N., Kılınç, D.: A real-world text classification application for an e-commerce platform. In: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–5. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Mostafizer Rahman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shiplu, A.I., Rahman, M.M., Watanobe, Y. (2024). A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification. In: Sachdeva, S., Watanobe, Y. (eds) Big Data Analytics in Astronomy, Science, and Engineering. BDA 2023. Lecture Notes in Computer Science, vol 14516. Springer, Cham. https://doi.org/10.1007/978-3-031-58502-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-58502-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-58501-2

  • Online ISBN: 978-3-031-58502-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics