Abstract
In the modern era, we find ourselves immersed in an ever-expanding flow of data where data is increasing exponentially. Data is generated from different platforms like Education, Business, E-commerce, and predominantly, social media platforms such as Twitter, YouTube, Facebook, and Instagram. Amidst this proliferation of content, user comments have emerged as a crucial element, serving as a platform for expressions of opinions, commendations, and critiques. However, within the abundance of user feedback lies a persistent issue: the presence of undesirable comments that elicit negative emotional responses and prove to be tedious and irrelevant. Effectively identifying and removing such comments poses a major challenge. This research addresses the imperative need for a robust comment classification model. To tackle this issue, a comprehensive investigation is conducted, employing a variety of machine learning models, including Decision Trees, Random Forests (RF), Naive Bayes, K-Nearest Neighbors, Gradient Boosting, AdaBoost, Logistic Regression, and Support Vector Machines (SVM) for comment classification. Furthermore, fundamental voting techniques such as Hard-Voting, Averaging, and Soft-Voting are incorporated with machine learning models to improve the classification performance. The objective is to discern the characteristics of text comments, classifying them, with the aim of achieving superior accuracy compared to prior research. In this paper, we propose a robust ensemble model, RF+AdaBoost+SVM+Soft-Voting, specifically designed for comment classification. The results obtained indicate that the proposed ensemble model achieved an impressive accuracy of approximately 98% for comment classification on YouTube dataset.
A. I. Shiplu and M. M. Rahman—Contributed Equally to this Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abubakar, U.B.U.: A comparison analysis of twitter based support vector machine and Bayes comment classification algorithms. Artif. Comput. Intell. (2020)
Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)
Alzoubi, Y.I., Topcu, A.E., Erkaya, A.E.: Machine learning-based text classification comparison: Turkish language context. Appl. Sci. 13(16), 9428 (2023)
Alzubi, J., Nayyar, A., Kumar, A.: Machine learning from theory to algorithms: an overview. In: Journal of Physics: Conference Series, vol. 1142, p. 012012. IOP Publishing (2018)
Aral, S., Eckles, D.: Protecting elections from social media manipulation. Science 365(6456), 858–861 (2019)
Asthana, P., Hazela, B.: Applications of machine learning in improving learning environment. In: Tanwar, S., Tyagi, S., Kumar, N. (eds.) Multimedia Big Data Computing for IoT Applications. Intelligent Systems Reference Library, vol. 163, pp. 417–433. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8759-3_16
Belcastro, L., Branda, F., Cantini, R., Marozzo, F., Talia, D., Trunfio, P.: Analyzing voter behavior on social media during the 2020 us presidential election campaign. Soc. Netw. Anal. Min. 12(1), 83 (2022)
Bi, Q., Goodman, K.E., Kaminsky, J., Lessler, J.: What is machine learning? A primer for the epidemiologist. Am. J. Epidemiol. 188(12), 2222–2239 (2019)
Carbonell, J.G., Michalski, R.S., Mitchell, T.M.: An overview of machine learning. Mach. Learn., 3–23 (1983)
Dietterich, T.G., et al.: Ensemble learning. Handb. Brain Theory Neural Netw. 2(1), 110–125 (2002)
El Naqa, I., Murphy, M.J.: What is Machine Learning? Springer, Cham (2015)
Flach, P., Kull, M.: Precision-recall-gain curves: PR analysis done right. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist., 1189–1232 (2001)
Gandhi, I., Pandey, M.: Hybrid ensemble of classifiers using voting. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), pp. 399–404. IEEE (2015)
González Canché, M.S.: Latent code identification (LACOID): a machine learning-based integrative framework [and open-source software] to classify big textual data, rebuild contextualized/unaltered meanings, and avoid aggregation bias. Int J Qual Methods 22, 16094069221144940 (2023)
Gudivada, V.N., Rao, C.R.: Computational analysis and understanding of natural languages: principles, methods and applications. (No Title) (2018)
Halibas, A.S., Shaffi, A.S., Mohamed, M.A.K.V.: Application of text classification and clustering of twitter data for business analytics. In: 2018 Majan International Conference (MIC), pp. 1–7. IEEE (2018)
Han, H., Jiang, X.: Overcome support vector machine diagnosis overfitting. Cancer Inform. 13, CIN–S13875 (2014)
Helm, J.M., et al.: Machine learning and artificial intelligence: definitions, applications, and future directions. Curr. Rev. Musculoskelet. Med. 13, 69–76 (2020)
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Joseph, A.M., et al.: COVID-19 misinformation on social media: a scoping review. Cureus 14(4) (2022)
Joyce, J.: Bayes’ theorem (2003)
Kumari, S., Kumar, D., Mittal, M.: An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comput. Eng. 2, 40–46 (2021)
Madden, A., Ruthven, I., McMenemy, D.: A classification scheme for content analyses of Youtube video comments. J. Document. 69(5), 693–714 (2013)
Maimon, O.Z., Rokach, L.: Data mining with decision trees: theory and applications, vol. 81. World scientific (2014)
Maiya, A.S.: ktrain: a low-code library for augmented machine learning. J. Mach. Learn. Res. 23(1), 7070–7075 (2022)
Mehmood, A., On, B.W., Lee, I., Ashraf, I., Sang Choi, G.: Spam comments prediction using stacking with ensemble learning. In: Journal of Physics: Conference Series, vol. 933, p. 012012. IOP Publishing (2018)
Mienye, I.D., Sun, Y.: A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10, 99129–99149 (2022). https://doi.org/10.1109/ACCESS.2022.3207287
Oh, H.: A Youtube spam comments detection scheme using cascaded ensemble machine learning model. IEEE Access 9, 144121–144128 (2021)
Othman, N.F., Din, W.: Youtube spam detection framework using naïve bayes and logistic regression. Indonesian J. Electr. Eng. Comput. Sci. 14(3), 1508–1517 (2019)
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Patel, P., Mistry, K.: A review: text classification on social media data. IOSR J. Comput. Eng. 17(1), 80–84 (2015)
Polikar, R.: Ensemble learning. Ensemble machine learning: methods and applications, pp. 1–34 (2012)
Prasad, G., et al.: Sentiment analysis on cryptocurrency using Youtube comments. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), pp. 730–733. IEEE (2022)
Rahman, M.M., Watanobe, Y., Nakamura, K.: Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Appl. Sci. 10(8), 2973 (2020)
Rahman, M.M., Watanobe, Y., Nakamura, K.: A bidirectional LSTM language model for code evaluation and repair. Symmetry 13(2), 247 (2021)
Raza, K.: Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. In: U-Healthcare Monitoring Systems, pp. 179–196. Elsevier (2019)
Reyana, A., Kautish, S., Karthik, P.S., Al-Baltah, I.A., Jasser, M.B., Mohamed, A.W.: Accelerating crop yield: multisensor data fusion and machine learning for agriculture text classification. IEEE Access 11, 20795–20805 (2023)
Rodrigues, A.P., et al.: Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques. Comput. Intell. Neurosci. 2022 (2022)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 80–91 (1998)
Sharma, G., Jurie, F., Pérez, P.: Learning non-linear SVM in input space for image classification. Ph.D. thesis, GREYC CNRS UMR 6072, Universite de Caen (2014)
Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning, pp. 20827–20840. PMLR (2022)
Tang, W., Tang, M., Ban, M., Zhao, Z., Feng, M.: CSGVD: a deep learning approach combining sequence and graph embedding for source code vulnerability detection. J. Syst. Softw. 199, 111623 (2023)
Tani, F.Y., Farid, D.M., Zahidur, M.: Ensemble of decision tree classifiers for mining web data streams. Commun. Appl. Electron. 1(1), 26–32 (2014)
Tufekci, Z.: Youtube, the great radicalizer. N.Y. Times 10(3), 2018 (2018)
Watanobe, Y., Rahman, M.M., Amin, M.F.I., Kabir, R.: Identifying algorithm in program code based on structural features using CNN classification model. Appl. Intell. 53(10), 12210–12236 (2023)
Wattenhofer, M., Wattenhofer, R., Zhu, Z.: The youtube social network. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 6, pp. 354–361 (2012)
Yeturu, K.: Machine learning algorithms, applications, and practices in data science. In: Handbook of Statistics, vol. 43, pp. 81–206. Elsevier (2020)
Yıldırım, F.M., Kaya, A., Öztürk, S.N., Kılınç, D.: A real-world text classification application for an e-commerce platform. In: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–5. IEEE (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shiplu, A.I., Rahman, M.M., Watanobe, Y. (2024). A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification. In: Sachdeva, S., Watanobe, Y. (eds) Big Data Analytics in Astronomy, Science, and Engineering. BDA 2023. Lecture Notes in Computer Science, vol 14516. Springer, Cham. https://doi.org/10.1007/978-3-031-58502-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-58502-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58501-2
Online ISBN: 978-3-031-58502-9
eBook Packages: Computer ScienceComputer Science (R0)