A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification

Shiplu, Ariful Islam; Rahman, Md. Mostafizer; Watanobe, Yutaka

doi:10.1007/978-3-031-58502-9_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14516))

Included in the following conference series:

International Conference on Big Data Analytics

42 Accesses

Abstract

In the modern era, we find ourselves immersed in an ever-expanding flow of data where data is increasing exponentially. Data is generated from different platforms like Education, Business, E-commerce, and predominantly, social media platforms such as Twitter, YouTube, Facebook, and Instagram. Amidst this proliferation of content, user comments have emerged as a crucial element, serving as a platform for expressions of opinions, commendations, and critiques. However, within the abundance of user feedback lies a persistent issue: the presence of undesirable comments that elicit negative emotional responses and prove to be tedious and irrelevant. Effectively identifying and removing such comments poses a major challenge. This research addresses the imperative need for a robust comment classification model. To tackle this issue, a comprehensive investigation is conducted, employing a variety of machine learning models, including Decision Trees, Random Forests (RF), Naive Bayes, K-Nearest Neighbors, Gradient Boosting, AdaBoost, Logistic Regression, and Support Vector Machines (SVM) for comment classification. Furthermore, fundamental voting techniques such as Hard-Voting, Averaging, and Soft-Voting are incorporated with machine learning models to improve the classification performance. The objective is to discern the characteristics of text comments, classifying them, with the aim of achieving superior accuracy compared to prior research. In this paper, we propose a robust ensemble model, RF+AdaBoost+SVM+Soft-Voting, specifically designed for comment classification. The results obtained indicate that the proposed ensemble model achieved an impressive accuracy of approximately 98% for comment classification on YouTube dataset.

A. I. Shiplu and M. M. Rahman—Contributed Equally to this Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abubakar, U.B.U.: A comparison analysis of twitter based support vector machine and Bayes comment classification algorithms. Artif. Comput. Intell. (2020)
Google Scholar
Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)
Article Google Scholar
Alzoubi, Y.I., Topcu, A.E., Erkaya, A.E.: Machine learning-based text classification comparison: Turkish language context. Appl. Sci. 13(16), 9428 (2023)
Article Google Scholar
Alzubi, J., Nayyar, A., Kumar, A.: Machine learning from theory to algorithms: an overview. In: Journal of Physics: Conference Series, vol. 1142, p. 012012. IOP Publishing (2018)
Google Scholar
Aral, S., Eckles, D.: Protecting elections from social media manipulation. Science 365(6456), 858–861 (2019)
Article Google Scholar
Asthana, P., Hazela, B.: Applications of machine learning in improving learning environment. In: Tanwar, S., Tyagi, S., Kumar, N. (eds.) Multimedia Big Data Computing for IoT Applications. Intelligent Systems Reference Library, vol. 163, pp. 417–433. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8759-3_16
Chapter Google Scholar
Belcastro, L., Branda, F., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Analyzing voter behavior on social media during the 2020 us presidential election campaign. Soc. Netw. Anal. Min. 12(1), 83 (2022)
Article Google Scholar
Bi, Q., Goodman, K.E., Kaminsky, J., Lessler, J.: What is machine learning? A primer for the epidemiologist. Am. J. Epidemiol. 188(12), 2222–2239 (2019)
Google Scholar
Carbonell, J.G., Michalski, R.S., Mitchell, T.M.: An overview of machine learning. Mach. Learn., 3–23 (1983)
Google Scholar
Dietterich, T.G., et al.: Ensemble learning. Handb. Brain Theory Neural Netw. 2(1), 110–125 (2002)
Google Scholar
El Naqa, I., Murphy, M.J.: What is Machine Learning? Springer, Cham (2015)
Google Scholar
Flach, P., Kull, M.: Precision-recall-gain curves: PR analysis done right. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist., 1189–1232 (2001)
Google Scholar
Gandhi, I., Pandey, M.: Hybrid ensemble of classifiers using voting. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), pp. 399–404. IEEE (2015)
Google Scholar
González Canché, M.S.: Latent code identification (LACOID): a machine learning-based integrative framework [and open-source software] to classify big textual data, rebuild contextualized/unaltered meanings, and avoid aggregation bias. Int J Qual Methods 22, 16094069221144940 (2023)
Article Google Scholar
Gudivada, V.N., Rao, C.R.: Computational analysis and understanding of natural languages: principles, methods and applications. (No Title) (2018)
Google Scholar
Halibas, A.S., Shaffi, A.S., Mohamed, M.A.K.V.: Application of text classification and clustering of twitter data for business analytics. In: 2018 Majan International Conference (MIC), pp. 1–7. IEEE (2018)
Google Scholar
Han, H., Jiang, X.: Overcome support vector machine diagnosis overfitting. Cancer Inform. 13, CIN–S13875 (2014)
Google Scholar
Helm, J.M., et al.: Machine learning and artificial intelligence: definitions, applications, and future directions. Curr. Rev. Musculoskelet. Med. 13, 69–76 (2020)
Article Google Scholar
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Article MathSciNet Google Scholar
Joseph, A.M., et al.: COVID-19 misinformation on social media: a scoping review. Cureus 14(4) (2022)
Google Scholar
Joyce, J.: Bayes’ theorem (2003)
Google Scholar
Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021)
Google Scholar
Madden, A., Ruthven, I., McMenemy, D.: A classification scheme for content analyses of Youtube video comments. J. Document. 69(5), 693–714 (2013)
Article Google Scholar
Maimon, O.Z., Rokach, L.: Data mining with decision trees: theory and applications, vol. 81. World scientific (2014)
Google Scholar
Maiya, A.S.: ktrain: a low-code library for augmented machine learning. J. Mach. Learn. Res. 23(1), 7070–7075 (2022)
MathSciNet Google Scholar
Mehmood, A., On, B.W., Lee, I., Ashraf, I., Sang Choi, G.: Spam comments prediction using stacking with ensemble learning. In: Journal of Physics: Conference Series, vol. 933, p. 012012. IOP Publishing (2018)
Google Scholar
Mienye, I.D., Sun, Y.: A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10, 99129–99149 (2022). https://doi.org/10.1109/ACCESS.2022.3207287
Article Google Scholar
Oh, H.: A Youtube spam comments detection scheme using cascaded ensemble machine learning model. IEEE Access 9, 144121–144128 (2021)
Article Google Scholar
Othman, N.F., Din, W.: Youtube spam detection framework using naïve bayes and logistic regression. Indonesian J. Electr. Eng. Comput. Sci. 14(3), 1508–1517 (2019)
Article Google Scholar
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Article Google Scholar
Patel, P., Mistry, K.: A review: text classification on social media data. IOSR J. Comput. Eng. 17(1), 80–84 (2015)
Google Scholar
Polikar, R.: Ensemble learning. Ensemble machine learning: methods and applications, pp. 1–34 (2012)
Google Scholar
Prasad, G., et al.: Sentiment analysis on cryptocurrency using Youtube comments. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), pp. 730–733. IEEE (2022)
Google Scholar
Rahman, M.M., Watanobe, Y., Nakamura, K.: Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Appl. Sci. 10(8), 2973 (2020)
Article Google Scholar
Rahman, M.M., Watanobe, Y., Nakamura, K.: A bidirectional LSTM language model for code evaluation and repair. Symmetry 13(2), 247 (2021)
Article Google Scholar
Raza, K.: Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. In: U-Healthcare Monitoring Systems, pp. 179–196. Elsevier (2019)
Google Scholar
Reyana, A., Kautish, S., Karthik, P.S., Al-Baltah, I.A., Jasser, M.B., Mohamed, A.W.: Accelerating crop yield: multisensor data fusion and machine learning for agriculture text classification. IEEE Access 11, 20795–20805 (2023)
Article Google Scholar
Rodrigues, A.P., et al.: Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques. Comput. Intell. Neurosci. 2022 (2022)
Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 80–91 (1998)
Google Scholar
Sharma, G., Jurie, F., Pérez, P.: Learning non-linear SVM in input space for image classification. Ph.D. thesis, GREYC CNRS UMR 6072, Universite de Caen (2014)
Google Scholar
Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning, pp. 20827–20840. PMLR (2022)
Google Scholar
Tang, W., Tang, M., Ban, M., Zhao, Z., Feng, M.: CSGVD: a deep learning approach combining sequence and graph embedding for source code vulnerability detection. J. Syst. Softw. 199, 111623 (2023)
Article Google Scholar
Tani, F.Y., Farid, D.M., Zahidur, M.: Ensemble of decision tree classifiers for mining web data streams. Commun. Appl. Electron. 1(1), 26–32 (2014)
Google Scholar
Tufekci, Z.: Youtube, the great radicalizer. N.Y. Times 10(3), 2018 (2018)
Google Scholar
Watanobe, Y., Rahman, M.M., Amin, M.F.I., Kabir, R.: Identifying algorithm in program code based on structural features using CNN classification model. Appl. Intell. 53(10), 12210–12236 (2023)
Article Google Scholar
Wattenhofer, M., Wattenhofer, R., Zhu, Z.: The youtube social network. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 6, pp. 354–361 (2012)
Google Scholar
Yeturu, K.: Machine learning algorithms, applications, and practices in data science. In: Handbook of Statistics, vol. 43, pp. 81–206. Elsevier (2020)
Google Scholar
Yıldırım, F.M., Kaya, A., Öztürk, S.N., Kılınç, D.: A real-world text classification application for an e-commerce platform. In: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–5. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Dhaka University of Engineering & Technology, Gazipur, Bangladesh
Ariful Islam Shiplu & Md. Mostafizer Rahman
The University of Aizu, Aizuwakamatsu, Japan
Md. Mostafizer Rahman & Yutaka Watanobe

Authors

Ariful Islam Shiplu
View author publications
You can also search for this author in PubMed Google Scholar
Md. Mostafizer Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Watanobe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Mostafizer Rahman .

Editor information

Editors and Affiliations

National Institute of Technology Delhi, New Delhi, Delhi, India
Shelly Sachdeva
University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
Yutaka Watanobe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shiplu, A.I., Rahman, M.M., Watanobe, Y. (2024). A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification. In: Sachdeva, S., Watanobe, Y. (eds) Big Data Analytics in Astronomy, Science, and Engineering. BDA 2023. Lecture Notes in Computer Science, vol 14516. Springer, Cham. https://doi.org/10.1007/978-3-031-58502-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-58502-9_10
Published: 27 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58501-2
Online ISBN: 978-3-031-58502-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification