Automating fake news detection system using multi-level voting model

Kaur, Sawinder; Kumar, Parteek; Kumaraguru, Ponnurangam

doi:10.1007/s00500-019-04436-y

Automating fake news detection system using multi-level voting model

Methodologies and Application
Published: 02 November 2019

Volume 24, pages 9049–9069, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

3507 Accesses
Explore all metrics

Abstract

The issues of online fake news have attained an increasing eminence in the diffusion of shaping news stories online. Misleading or unreliable information in the form of videos, posts, articles, URLs is extensively disseminated through popular social media platforms such as Facebook and Twitter. As a result, editors and journalists are in need of new tools that can help them to pace up the verification process for the content that has been originated from social media. Motivated by the need for automated detection of fake news, the goal is to find out which classification model identifies phony features accurately using three feature extraction techniques, Term Frequency–Inverse Document Frequency (TF–IDF), Count-Vectorizer (CV) and Hashing-Vectorizer (HV). Also, in this paper, a novel multi-level voting ensemble model is proposed. The proposed system has been tested on three datasets using twelve classifiers. These ML classifiers are combined based on their false prediction ratio. It has been observed that the Passive Aggressive, Logistic Regression and Linear Support Vector Classifier (LinearSVC) individually perform best using TF-IDF, CV and HV feature extraction approaches, respectively, based on their performance metrics, whereas the proposed model outperforms the Passive Aggressive model by 0.8%, Logistic Regression model by 1.3%, LinearSVC model by 0.4% using TF-IDF, CV and HV, respectively. The proposed system can also be used to predict the fake content (textual form) from online social media websites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Three-Level Voting Model for Detecting Misleading Information on COVID-19

Fake News Detection and Analysis Using Online Machine Learning Techniques

Efficient Prediction of Fake News Using Novel Ensemble Technique Based on Machine Learning Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: automatic realtime phishing detection on twitter. In: eCrime researchers summit (eCrime). IEEE, pp 1–12
Aggarwal A, Kumar S, Bhargava K, Kumaraguru P (2018) The follower count fallacy: detecting twitter users with manipulated follower count
Ahmed F, Abulaish M (2012) An MCL-based approach for spam profile detection in online social networks. In: IEEE 11th international conference on trust, security and privacy in computing and communications (TrustCom). IEEE, pp 602–608
Ahmed H, Traore I, Saad S (2017) Detection of online fake news using n-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, pp 127–138
Alahmadi A, Joorabchi A, Mahdi AE (2013) A new text representation scheme combining bag-of-words and bag-of-concepts approaches for automatic text classification. In: 2013 7th IEEE GCC conference and exhibition (GCC). IEEE, pp 108–113
Batchelor O (2017) Getting out the truth: the role of libraries in the fight against fake news. Ref Serv Rev 45(2):143
Article Google Scholar
Benevenuto F, Rodrigues T, Almeida V, Almeida J, Gonçalves M (2009) Detecting spammers and content promoters in online video social networks. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 620–627
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12
Caetano JA, de Oliveira JF, Lima HS, Marques-Neto HT, Magno G, Meira W Jr, Almeida VA (2018) Analyzing and characterizing political discussions in WhatsApp public groups. arXiv preprint arXiv:1804.00397
Canini KR, Suh B, Pirolli PL (2011) Finding credible information sources in social networks based on content and social structure. In: IEEE third international conference on social computing (SocialCom). IEEE third international conference on privacy, security, risk and trust (PASSAT). IEEE, pp 1–8
Chen Y, Conroy NJ, Rubin VL (2015) Misleading online content: recognizing clickbait as false news. In: Proceedings of the 2015 ACM on workshop on multimodal deception detection. ACM, pp 15–19
Chhabra S, Aggarwal A, Benevenuto F, Kumaraguru P (2011) Phi.sh\$ocial: the phishing landscape through short URLs. In: Proceedings of the 8th annual collaboration, electronic messaging, anti-abuse and spam conference. ACM, pp 92–101
Conroy NJ, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1
Article Google Scholar
D’Angelo G, Palmieri F, Rampone S (2019) Detecting unfair recommendations in trust-based pervasive environments. Inf Sci 486:31
Article Google Scholar
Dewan P, Kumaraguru P (2015) Towards automatic real time identification of malicious posts on facebook. In: 13th Annual conference on privacy, security and trust (PST). IEEE, pp 85–92
Dewan P, Kumaraguru P (2017) Facebook inspector (FbI): towards automatic real-time detection of malicious content on Facebook. Soc Netw Anal Min 7(1):15
Article Google Scholar
Dewan P, Gupta M, Goyal K, Kumaraguru P (2013) Multiosn: realtime monitoring of real world events on multiple online social media. In: Proceedings of the 5th IBM collaborative academia research exchange workshop. ACM, p 6
Fake news on whatsapp. http://bit.ly/2miuv9j. Last accessed 27 Aug 2019
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 35–47
Garimella K, Tyson G (2018) WhatsApp, doc? A first look at WhatsApp public group data. arXiv preprint arXiv:1804.01473
Gupta A, Kumaraguru P (2012a) Credibility ranking of tweets during high impact events. In: Proceedings of the 1st workshop on privacy and security in online social media. ACM, p 2
Gupta A, Kumaraguru P (2012b) Twitter explodes with activity in Mumbai blasts! a lifeline or an unmonitored daemon in the lurking? Technical report
Gupta A, Lamba H, Kumaraguru P (2013a) \$ 1.00 per rt #BostonMarathon #PrayForBoston: analyzing fake content on twitter. In: eCrime researchers summit (eCRS). IEEE, pp 1–12
Gupta A, Lamba H, Kumaraguru P, Joshi A (2013b) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web. ACM, pp 729–736
Jain P, Kumaraguru P (2016) On the dynamics of username changing behavior on twitter. In: Proceedings of the 3rd IKDD conference on data science. ACM, p 6
Kaggle database. https://bit.ly/2BmqBQE. Last accessed 22 Oct 2017
Kaggle database. https://bit.ly/2Ex5VsX. Last accessed 24 Oct 2017
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Kuleshov V, Thakoor S, Lau T, Ermon S (2018) Adversarial examples for natural language classification problems
Magdy A, Wanas N (2010) Web-based statistical fact checking of textual documents. In: Proceedings of the 2nd international workshop on search and mining user-generated contents. ACM, pp 103–110
Markines B, Cattuto C, Menczer F (2009) Social spam detection. In Proceedings of the 5th international workshop on adversarial information retrieval on the web. ACM, pp 41–48
Mishu SZ, Rafiuddin S (2016) Performance analysis of supervised machine learning algorithms for text classification. In: 19th International conference on computer and information technology (ICCIT). IEEE, pp 409–413
News trends database. https://bit.ly/2zVRLxK. Last accessed 18 Oct 2017
Pontes T, Magno T, Vasconcelos M, Gupta A, Almeida J, Kumaraguru P, Almeida V (2012a) Beware of what you share: inferring home location in social networks. In: IEEE 12th international conference on data mining workshops (ICDMW). IEEE, pp 571–578
Pontes T, Vasconcelos M, Almeida J, Kumaraguru P, Almeida V (2012b) We know where you live: privacy characterization of foursquare behavior. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, pp 898–905
Qazvinian V, Rosengren E, Radev DR, Mei Q (2011) Rumor has it: identifying misinformation in microblogs. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1589–1599
Rubin VL, Conroy NJ, Chen Y (2015) Towards news verification: deception detection methods for news discourse. In: Hawaii international conference on system sciences
Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceedings of the second workshop on computational approaches to deception detection , pp 7–17
Ruchansky N, Seo S, Liu Y (2017) CSI: a hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 797–806
Sen I, Aggarwal A, Mian S, Singh S, Kumaraguru P, Datta A (2018) Worth its weight in likes: towards detecting fake likes on Instagram. In: Proceedings of the 10th ACM conference on web science. ACM, pp 205–209
Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19(1):22
Article Google Scholar
Sirajudeen SM, Azmi NFA, Abubakar AI (2017) Online fake news detection algorithm. J Theor Appl Inf Technol 95(17):4114
Google Scholar
Stein B, Zu Eissen SM (2008) Retrieval models for genre classification. Scand J Inf Syst 20(1):3
Volkova S, Shaffer K, Jang JY, Hodas N (2017) Separating facts from fiction: linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2, short papers), vol 2, pp 647–653
Wang AH (2010) Don’t follow me: spam detection in twitter. In: Proceedings of the 2010 international conference on security and cryptography (SECRYPT). IEEE, pp 1–10
Wei W, Wan X (2017) Learning to identify ambiguous and misleading news headlines. arXiv preprint arXiv1705.06031
Weimer M, Gurevych I, Mühlhäuser M (2007) Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, pp 125–128

Download references

Acknowledgements

This Publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya PhD Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia).

Funding

Funding was provided by Digital India Corporation (formerly Media Lab Asia) (Grant No. U72900MH2001NPL133410).

Author information

Authors and Affiliations

Doctoral Research Lab-II, Computer Science and Engineering Department, TIET, Patiala, India
Sawinder Kaur
Computer Science and Engineering Department, TIET, Patiala, India
Parteek Kumar
Computer Science and Engineering Department, IIIT, Delhi, India
Ponnurangam Kumaraguru

Authors

Sawinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Parteek Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Ponnurangam Kumaraguru
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sawinder Kaur.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaur, S., Kumar, P. & Kumaraguru, P. Automating fake news detection system using multi-level voting model. Soft Comput 24, 9049–9069 (2020). https://doi.org/10.1007/s00500-019-04436-y

Download citation

Published: 02 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00500-019-04436-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automating fake news detection system using multi-level voting model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Three-Level Voting Model for Detecting Misleading Information on COVID-19

Fake News Detection and Analysis Using Online Machine Learning Techniques

Efficient Prediction of Fake News Using Novel Ensemble Technique Based on Machine Learning Algorithm

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automating fake news detection system using multi-level voting model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Three-Level Voting Model for Detecting Misleading Information on COVID-19

Fake News Detection and Analysis Using Online Machine Learning Techniques

Efficient Prediction of Fake News Using Novel Ensemble Technique Based on Machine Learning Algorithm

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation