Skip to main content
Log in

Determining an Optimal Data Classification Model for Credibility-Based Fake News Detection

  • Article
  • Published:
The Review of Socionetwork Strategies Aims and scope Submit manuscript

Abstract

The existence of fake news is a problem challenging today’s social media enabled world. Fake news can be classified using different methods. Predicting and detecting fake news has proven to be a challenge even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility-Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies, namely the Receiver Operating Characteristic (ROC) curve, the precision-recall curve, the Lift curve, and numerical metrics. The analysis of these experiments and results reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, the Two-Class Boosted Decision Tree has proven to be best suited for the purpose of Credibility-Based Fake News Detection. Based upon this conclusion, the main contribution of this paper, a deep analysis of the excellent performance of the Two-Class Boosted Decision Tree for Credibility-Based Fake News Detection is finally presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

References

  1. Brownlee, J. (2019). How to use ROC curves and precision-recall curves for classification in python. Machine Learning Mastery. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/. Accessed 18 Dec 2019.

  2. “Classification: ROC Curve and AUC | Machine Learning Crash Course.” Google. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc. Accessed 6 June 2020.

  3. Choudhary, R., & Gianey, H. K. (2017). Comprehensive review on supervised machine learning algorithms. In 2017 International Conference on Machine Learning and Data Science (MLDS), pp. 37–43. IEEE.

  4. Coadou, Y. (2013). Boosted decision trees and applications. In EPJ Web of conferences (Vol. 55, p. 02004). EDP Sciences.

  5. Collins, B., Hoang, D. T., Nguyen, N. T., & Hwang, D. (2020). Fake news types and detection models on social media a state-of-the-art survey. In Asian Conference on Intelligent Information and Database Systems, pp. 562–573. Springer.

  6. Datta, A., & Si, S. (2019). A supervised machine learning approach to fake news identification. In International Conference on Intelligent Data Communication Technologies and Internet of Things, pp. 197–204. Springer.

  7. de Souza, J. V., Gomes, J., Jr., de Souza Filho, F. M., de Oliveira Julio, A. M., & de Souza, J. F. (2020). A systematic mapping on automatic classification of fake news in social media. Social Network Analysis and Mining, 10(1), 1–21.

    Article  Google Scholar 

  8. Ekelund, S. (2017). Precision-recall curves–what are they and how are they used? acutecaretesting.org.

  9. Elhadad, M. K., Li, K. F., & Gebali, F. (2019). Fake news detection on social media: A systematic survey. In 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), pp. 1–8. IEEE.

  10. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

    Article  Google Scholar 

  11. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

    Article  Google Scholar 

  12. Gaonkar, S., Itagi, S., Chalippatt, R., Gaonkar, A., Aswale, S., & Shetgaonkar, P. (2019). Detection of online fake news: A survey. In 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), pp. 1–6. IEEE.

  13. Grace-Martin, K., Yelle, D., Mehul, Kumar, M., Becky, & Rperkins. (2018). Measures of predictive models: sensitivity and specificity. The Analysis Factor, January 5, 2018. https://www.theanalysisfactor.com/sensitivity-and-specificity/.

  14. J-Martens. (2020). Machine Learning Studio (Classic) Documentation—Azure. Microsoft Docs. https://docs.microsoft.com/en-us/azure/machine-learning/studio/. Accessed 22 April 2020.

  15. Kazllarof, V., Karlos, S., & Kotsiantis, S. (2019). Active learning rotation forest for multiclass classification. Computational Intelligence, 35(4), 891–918.

    Article  Google Scholar 

  16. Minewiskan. (2020). Lift Chart (Analysis Services—Data Mining). Microsoft Docs. https://docs.microsoft.com/en-us/analysis-services/data-mining/lift-chart-analysis-services-data-mining?view=asallproducts-allversions. Accessed 21 Dec 2020.

  17. Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7, 21.

    Article  Google Scholar 

  18. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. (2017). Automatic detection of fake news. arXiv preprint. arXiv:1708.07104

  19. Profillidis, V. A., & Botzoris, G. N. (2019). Chapter 5—statistical methods for transport demand modeling. In V. A. Profillidis & G. N. Botzoris (Eds.), Modeling of transport demand.

  20. Qin, Z., Yan, L., Zhuang, H., Tay, Y., Pasumarthi, R. K., Wang, X., Bendersky, M., & Najork, M. (2021). Are neural rankers still outperformed by gradient boosted decision trees?

  21. Ramkissoon, A. N., & Mohammed, S. (2020). An experimental evaluation of data classification models for credibility based fake news detection. In 2020 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE.

  22. Richard, & Lovell, J. (2020). The War on Fake News: College of Communication. Accessed February 5, 2020.

  23. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. https://doi.org/10.1145/3137597.3137600

    Article  Google Scholar 

  24. Shu, K., Wang, S., & Liu, H. (2018). Understanding user profiles on social media for fake news detection. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 430–435. IEEE.

  25. Vuk, M., & Curk, T. (2006). ROC curve, lift chart and calibration plot. Metodoloski Zvezki, 3(1), 89.

    Google Scholar 

  26. Younus, K., Junaed, Khondaker, T. I., Iqbal, A., & Afroz, S. (2019). A benchmark study on machine learning methods for fake news detection. arXiv preprint arXiv:1905.04749 (2019).

  27. Zahra, K., Imran, M., & Ostermann, F. O. (2020). Automatic identification of eyewitness messages on twitter during disasters. Information Processing & Management, 57(1), 102107.

    Article  Google Scholar 

  28. Zhang, D., Wang, J., & Zhao, X. (2015). Estimating the uncertainty of average F1 scores. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, pp. 317–320.

  29. Zhang, X., & Ghorbani, A. A. (2020). An overview of online fake news: Characterization, detection, and discussion. Information Processing & Management, 57(2), 102025.

    Article  Google Scholar 

  30. Zhou, X., & Zafarani, R. (2018). Fake news: A survey of research, detection methods, and opportunities. arXiv preprint arXiv:1812.00315.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Neil Ramkissoon.

Ethics declarations

Conflicts of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramkissoon, A.N., Mohammed, S. & Goodridge, W. Determining an Optimal Data Classification Model for Credibility-Based Fake News Detection. Rev Socionetwork Strat 15, 347–380 (2021). https://doi.org/10.1007/s12626-021-00093-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12626-021-00093-6

Keywords

Navigation