Abstract
The existence of fake news is a problem challenging today’s social media enabled world. Fake news can be classified using different methods. Predicting and detecting fake news has proven to be a challenge even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility-Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies, namely the Receiver Operating Characteristic (ROC) curve, the precision-recall curve, the Lift curve, and numerical metrics. The analysis of these experiments and results reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, the Two-Class Boosted Decision Tree has proven to be best suited for the purpose of Credibility-Based Fake News Detection. Based upon this conclusion, the main contribution of this paper, a deep analysis of the excellent performance of the Two-Class Boosted Decision Tree for Credibility-Based Fake News Detection is finally presented.
Similar content being viewed by others
References
Brownlee, J. (2019). How to use ROC curves and precision-recall curves for classification in python. Machine Learning Mastery. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/. Accessed 18 Dec 2019.
“Classification: ROC Curve and AUC | Machine Learning Crash Course.” Google. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc. Accessed 6 June 2020.
Choudhary, R., & Gianey, H. K. (2017). Comprehensive review on supervised machine learning algorithms. In 2017 International Conference on Machine Learning and Data Science (MLDS), pp. 37–43. IEEE.
Coadou, Y. (2013). Boosted decision trees and applications. In EPJ Web of conferences (Vol. 55, p. 02004). EDP Sciences.
Collins, B., Hoang, D. T., Nguyen, N. T., & Hwang, D. (2020). Fake news types and detection models on social media a state-of-the-art survey. In Asian Conference on Intelligent Information and Database Systems, pp. 562–573. Springer.
Datta, A., & Si, S. (2019). A supervised machine learning approach to fake news identification. In International Conference on Intelligent Data Communication Technologies and Internet of Things, pp. 197–204. Springer.
de Souza, J. V., Gomes, J., Jr., de Souza Filho, F. M., de Oliveira Julio, A. M., & de Souza, J. F. (2020). A systematic mapping on automatic classification of fake news in social media. Social Network Analysis and Mining, 10(1), 1–21.
Ekelund, S. (2017). Precision-recall curves–what are they and how are they used? acutecaretesting.org.
Elhadad, M. K., Li, K. F., & Gebali, F. (2019). Fake news detection on social media: A systematic survey. In 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), pp. 1–8. IEEE.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Gaonkar, S., Itagi, S., Chalippatt, R., Gaonkar, A., Aswale, S., & Shetgaonkar, P. (2019). Detection of online fake news: A survey. In 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), pp. 1–6. IEEE.
Grace-Martin, K., Yelle, D., Mehul, Kumar, M., Becky, & Rperkins. (2018). Measures of predictive models: sensitivity and specificity. The Analysis Factor, January 5, 2018. https://www.theanalysisfactor.com/sensitivity-and-specificity/.
J-Martens. (2020). Machine Learning Studio (Classic) Documentation—Azure. Microsoft Docs. https://docs.microsoft.com/en-us/azure/machine-learning/studio/. Accessed 22 April 2020.
Kazllarof, V., Karlos, S., & Kotsiantis, S. (2019). Active learning rotation forest for multiclass classification. Computational Intelligence, 35(4), 891–918.
Minewiskan. (2020). Lift Chart (Analysis Services—Data Mining). Microsoft Docs. https://docs.microsoft.com/en-us/analysis-services/data-mining/lift-chart-analysis-services-data-mining?view=asallproducts-allversions. Accessed 21 Dec 2020.
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7, 21.
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. (2017). Automatic detection of fake news. arXiv preprint. arXiv:1708.07104
Profillidis, V. A., & Botzoris, G. N. (2019). Chapter 5—statistical methods for transport demand modeling. In V. A. Profillidis & G. N. Botzoris (Eds.), Modeling of transport demand.
Qin, Z., Yan, L., Zhuang, H., Tay, Y., Pasumarthi, R. K., Wang, X., Bendersky, M., & Najork, M. (2021). Are neural rankers still outperformed by gradient boosted decision trees?
Ramkissoon, A. N., & Mohammed, S. (2020). An experimental evaluation of data classification models for credibility based fake news detection. In 2020 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE.
Richard, & Lovell, J. (2020). The War on Fake News: College of Communication. Accessed February 5, 2020.
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. https://doi.org/10.1145/3137597.3137600
Shu, K., Wang, S., & Liu, H. (2018). Understanding user profiles on social media for fake news detection. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 430–435. IEEE.
Vuk, M., & Curk, T. (2006). ROC curve, lift chart and calibration plot. Metodoloski Zvezki, 3(1), 89.
Younus, K., Junaed, Khondaker, T. I., Iqbal, A., & Afroz, S. (2019). A benchmark study on machine learning methods for fake news detection. arXiv preprint arXiv:1905.04749 (2019).
Zahra, K., Imran, M., & Ostermann, F. O. (2020). Automatic identification of eyewitness messages on twitter during disasters. Information Processing & Management, 57(1), 102107.
Zhang, D., Wang, J., & Zhao, X. (2015). Estimating the uncertainty of average F1 scores. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, pp. 317–320.
Zhang, X., & Ghorbani, A. A. (2020). An overview of online fake news: Characterization, detection, and discussion. Information Processing & Management, 57(2), 102025.
Zhou, X., & Zafarani, R. (2018). Fake news: A survey of research, detection methods, and opportunities. arXiv preprint arXiv:1812.00315.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ramkissoon, A.N., Mohammed, S. & Goodridge, W. Determining an Optimal Data Classification Model for Credibility-Based Fake News Detection. Rev Socionetwork Strat 15, 347–380 (2021). https://doi.org/10.1007/s12626-021-00093-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-021-00093-6