Skip to main content

Advertisement

Log in

Automating fake news detection system using multi-level voting model

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The issues of online fake news have attained an increasing eminence in the diffusion of shaping news stories online. Misleading or unreliable information in the form of videos, posts, articles, URLs is extensively disseminated through popular social media platforms such as Facebook and Twitter. As a result, editors and journalists are in need of new tools that can help them to pace up the verification process for the content that has been originated from social media. Motivated by the need for automated detection of fake news, the goal is to find out which classification model identifies phony features accurately using three feature extraction techniques, Term Frequency–Inverse Document Frequency (TF–IDF), Count-Vectorizer (CV) and Hashing-Vectorizer (HV). Also, in this paper, a novel multi-level voting ensemble model is proposed. The proposed system has been tested on three datasets using twelve classifiers. These ML classifiers are combined based on their false prediction ratio. It has been observed that the Passive Aggressive, Logistic Regression and Linear Support Vector Classifier (LinearSVC) individually perform best using TF-IDF, CV and HV feature extraction approaches, respectively, based on their performance metrics, whereas the proposed model outperforms the Passive Aggressive model by 0.8%, Logistic Regression model by 1.3%, LinearSVC model by 0.4% using TF-IDF, CV and HV, respectively. The proposed system can also be used to predict the fake content (textual form) from online social media websites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: automatic realtime phishing detection on twitter. In: eCrime researchers summit (eCrime). IEEE, pp 1–12

  • Aggarwal A, Kumar S, Bhargava K, Kumaraguru P (2018) The follower count fallacy: detecting twitter users with manipulated follower count

  • Ahmed F, Abulaish M (2012) An MCL-based approach for spam profile detection in online social networks. In: IEEE 11th international conference on trust, security and privacy in computing and communications (TrustCom). IEEE, pp 602–608

  • Ahmed H, Traore I, Saad S (2017) Detection of online fake news using n-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, pp 127–138

  • Alahmadi A, Joorabchi A, Mahdi AE (2013) A new text representation scheme combining bag-of-words and bag-of-concepts approaches for automatic text classification. In: 2013 7th IEEE GCC conference and exhibition (GCC). IEEE, pp 108–113

  • Batchelor O (2017) Getting out the truth: the role of libraries in the fight against fake news. Ref Serv Rev 45(2):143

    Article  Google Scholar 

  • Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 620–627

  • Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12

  • Caetano JA, de Oliveira JF, Lima HS, Marques-Neto HT, Magno G, Meira W Jr, Almeida VA (2018) Analyzing and characterizing political discussions in WhatsApp public groups. arXiv preprint arXiv:1804.00397

  • Canini KR, Suh B, Pirolli PL (2011) Finding credible information sources in social networks based on content and social structure. In: IEEE third international conference on social computing (SocialCom). IEEE third international conference on privacy, security, risk and trust (PASSAT). IEEE, pp 1–8

  • Chen Y, Conroy NJ, Rubin VL (2015) Misleading online content: recognizing clickbait as false news. In: Proceedings of the 2015 ACM on workshop on multimodal deception detection. ACM, pp 15–19

  • Chhabra S, Aggarwal A, Benevenuto F, Kumaraguru P (2011) Phi.sh\$ocial: the phishing landscape through short URLs. In: Proceedings of the 8th annual collaboration, electronic messaging, anti-abuse and spam conference. ACM, pp 92–101

  • Conroy NJ, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1

    Article  Google Scholar 

  • D’Angelo G, Palmieri F, Rampone S (2019) Detecting unfair recommendations in trust-based pervasive environments. Inf Sci 486:31

    Article  Google Scholar 

  • Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: 13th Annual conference on privacy, security and trust (PST). IEEE, pp 85–92

  • Dewan P, Kumaraguru P (2017) Facebook inspector (FbI): towards automatic real-time detection of malicious content on Facebook. Soc Netw Anal Min 7(1):15

    Article  Google Scholar 

  • Dewan P, Gupta M, Goyal K, Kumaraguru P (2013) Multiosn: realtime monitoring of real world events on multiple online social media. In: Proceedings of the 5th IBM collaborative academia research exchange workshop. ACM, p 6

  • Fake news on whatsapp. http://bit.ly/2miuv9j. Last accessed 27 Aug 2019

  • Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 35–47

  • Garimella K, Tyson G (2018) WhatsApp, doc? A first look at WhatsApp public group data. arXiv preprint arXiv:1804.01473

  • Gupta A, Kumaraguru P (2012a) Credibility ranking of tweets during high impact events. In: Proceedings of the 1st workshop on privacy and security in online social media. ACM, p 2

  • Gupta A, Kumaraguru P (2012b) Twitter explodes with activity in Mumbai blasts! a lifeline or an unmonitored daemon in the lurking? Technical report

  • Gupta A, Lamba H, Kumaraguru P (2013a) \$ 1.00 per rt #BostonMarathon #PrayForBoston: analyzing fake content on twitter. In: eCrime researchers summit (eCRS). IEEE, pp 1–12

  • Gupta A, Lamba H, Kumaraguru P, Joshi A (2013b) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web. ACM, pp 729–736

  • Jain P, Kumaraguru P (2016) On the dynamics of username changing behavior on twitter. In: Proceedings of the 3rd IKDD conference on data science. ACM, p 6

  • Kaggle database. https://bit.ly/2BmqBQE. Last accessed 22 Oct 2017

  • Kaggle database. https://bit.ly/2Ex5VsX. Last accessed 24 Oct 2017

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

  • Kuleshov V, Thakoor S, Lau T, Ermon S (2018) Adversarial examples for natural language classification problems

  • Magdy A, Wanas N (2010) Web-based statistical fact checking of textual documents. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. ACM, pp 103–110

  • Markines B, Cattuto C, Menczer F (2009) Social spam detection. In Proceedings of the 5th international workshop on adversarial information retrieval on the web. ACM, pp 41–48

  • Mishu SZ, Rafiuddin S (2016) Performance analysis of supervised machine learning algorithms for text classification. In: 19th International conference on computer and information technology (ICCIT). IEEE, pp 409–413

  • News trends database. https://bit.ly/2zVRLxK. Last accessed 18 Oct 2017

  • Pontes T, Magno T, Vasconcelos M, Gupta A, Almeida J, Kumaraguru P, Almeida V (2012a) Beware of what you share: inferring home location in social networks. In: IEEE 12th international conference on data mining workshops (ICDMW). IEEE, pp 571–578

  • Pontes T, Vasconcelos M, Almeida J, Kumaraguru P, Almeida V (2012b) We know where you live: privacy characterization of foursquare behavior. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, pp 898–905

  • Qazvinian V, Rosengren E, Radev DR, Mei Q (2011) Rumor has it: identifying misinformation in microblogs. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1589–1599

  • Rubin VL, Conroy NJ, Chen Y (2015) Towards news verification: deception detection methods for news discourse. In: Hawaii international conference on system sciences

  • Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of the second workshop on computational approaches to deception detection , pp 7–17

  • Ruchansky N, Seo S, Liu Y (2017) CSI: a hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 797–806

  • Sen I, Aggarwal A, Mian S, Singh S, Kumaraguru P, Datta A (2018) Worth its weight in likes: towards detecting fake likes on Instagram. In: Proceedings of the 10th ACM conference on web science. ACM, pp 205–209

  • Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19(1):22

    Article  Google Scholar 

  • Sirajudeen SM, Azmi NFA, Abubakar AI (2017) Online fake news detection algorithm. J Theor Appl Inf Technol 95(17):4114

    Google Scholar 

  • Stein B, Zu Eissen SM (2008) Retrieval models for genre classification. Scand J Inf Syst 20(1):3

  • Volkova S, Shaffer K, Jang JY, Hodas N (2017) Separating facts from fiction: linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2, short papers), vol 2, pp 647–653

  • Wang AH (2010) Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 international conference on security and cryptography (SECRYPT). IEEE, pp 1–10

  • Wei W, Wan X (2017) Learning to identify ambiguous and misleading news headlines. arXiv preprint arXiv1705.06031

  • Weimer M, Gurevych I, Mühlhäuser M (2007) Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, pp 125–128

Download references

Acknowledgements

This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

Funding

Funding was provided by Digital India Corporation (formerly Media Lab Asia) (Grant No. U72900MH2001NPL133410).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sawinder Kaur.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, S., Kumar, P. & Kumaraguru, P. Automating fake news detection system using multi-level voting model. Soft Comput 24, 9049–9069 (2020). https://doi.org/10.1007/s00500-019-04436-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04436-y

Keywords

Navigation