Survey of fake news detection using machine intelligence approach

doi:10.1016/j.datak.2022.102118

Data & Knowledge Engineering

Volume 144, March 2023, 102118

https://doi.org/10.1016/j.datak.2022.102118 Get rights and content

Abstract

With the extensive spreading of all information through digital platforms, it is of maximal importance that each people get to differentiate between them. Fake news is a vast problem in our society we cannot predict which news is fake or real without having knowledge or proof of that particular news. This has become a supreme problem, so we decided to create a solution to this problem. Thus, we built a small model which helps in detecting fake news, where we are dealing with some articles which have been collected from the internet. We have labeled each of them as either fake or true. We have trained our dataset using these articles and have used different machine learning algorithms like Passive Aggressive Classifier, Naïve Bayes, Logistic Regression, Decision Tree, Long short term memory (LSTM), and Bidirectional Encoder Representations from Transformers (BERT) to compare the results. Our experimental result has achieved 99.6% accuracy from Decision Tree algorithm and obtained 99.8% recall from LSTM for detection of fake news. Passive Aggressive Classifier performs excellent on a large data set.

Introduction

With the advent of globalization and the rapid development of online platforms (including Facebook and Twitter), a good approach to information exchange has opened up that has never been ever seen in human history [1], [2]. The propagation of false news also has a tremendous impact on the rest of the world. Fake news is also propagated via social media platforms such as Facebook and Twitter [3], [4]. Our empowerment to create judgments is largely determined by the knowledge we absorb; our viewpoint is influenced by the information we consume. There is mounting evidence that people have responded irrationally to news that afterward proved to be false [5], [6]. Our ability to make decisions is primarily impacted by the information we ingest; our perspective is influenced by the events we intake. People have reacted unreasonably to news that later turned out to be untrue, according to accumulating evidence.

This comprehensive machine learning (ML) based research article for identifying false news is concerned with both fake and genuine news [7]. Using the sklearn module, free python language-based ML library and term frequency inverse document frequency (TF-IDF) vectorizer, we can say about a token in our dataset. Then, we initialize ML models and fit the token. Here we have considered different ML models Passive Aggressive Classifier, Naïve Bayes algorithm, Logistic Regression, Decision Tree, LSTM and BERT.

In the end, the accuracy score and the confusion matrix tell us how well our model fares. This provides us an approximate value of our model, indicating if it is performing properly or not. After that, we can accept user input and determine whether it is phony or real [8], [9].

This provides us an approximate value of our model, indicating if it is performing properly or not. After that, we can accept user input and determine whether it is phony or real [10]. Our remaining paper is organized in the form of the different types of algorithms those are explained in the related work section. The problem regarding the fake news is noted early. The solution to this problem and the algorithm used is presented in the evaluation section, then there are the final results written in the result and discussion section. Finally, conclusion is presented.

Section snippets

Related work

This section deals with the basics of the existing algorithms.

Evaluation

The social media sites are highly strong and beneficial for discussing different vital issues for the society. It needs to follow ethics also. It is our responsibility to maintain veracity of spreading correct news.

Results and discussion

We have trained this model every time, this false news detection algorithm always returns an accurate score. When we train our model with a larger set of data, we get approximate and the finest data-testing outcome. It is simple to calculate the approximate value using the machine learning idea of calculating the mean value of the generated vector and comparing it to our train datasets to see whether the number of positive outcomes is more than the news is true or false.

According to accuracy,

Conclusions

We categorized the news in a word set and discovered the proper key terms to create an appropriate model. We have used different ML algorithms to train our model. Confusion matrix is depicted for measuring performance to make correct decisions concerning articles labeled as real or fake.

There are several outstanding challenges in the identification of fake news that researchers must address. Identifying essential aspects involved in the distribution of news, for example, can help to minimize

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We downloaded the data from Kaggle and [ISDDC 2017] ISOT [27]. GitHub provided us with the procedures for various areas of the code, which we appreciate. We appreciated it because we were able to learn about all of these current events and placed them into being used in this project.

BERT model is very huge because of the training structure. Due to its size and the number of weights that need to be updated, training is sluggish. So, we have used a short data set for it which we have uploaded on

Aishika Pal was born in 1999 in West Bengal, India. She is completing her undergraduate B-Tech degree in the field of Information Technology from Dr. B C Roy Engineering College, Durgapur. Her research interests include Cyber Security, Machine Learning and Artificial Intelligence.

References (28)

AldwairiM. et al.
Detecting Fake News in Social Media Networks
(2018)
MarrB.
Coronavirus fake news: how Facebook, Twitter, and Instagram are tackling the problem
ShuK. et al.
Fake news detection on social media: a data mining perspective
N. Ruchansky, S. Seo, Y. Liu, CSI: A Hybrid Deep Model for Fake News Detection, in: Proceedings of the 2017 ACM on...
OkoroE.M. et al.
A hybrid approach to fake news detection on social media
Niger. J. Technol.
(2018)
KwonSejeong et al.
Prominent features of rumor propagation in online social media
Pérez-RosasV. et al.
Automatic detection of fake news
(2017)
VosoughiS. et al.
The spread of true and false news online
Science
(2018)
HuaJ. et al.
Corona virus (covid-19) infodemic and emerging issues through a data lens: the case of China
Int. J. Environ. Res. Public Health
(2020)
WadhwaL.
Detecting fake political news online
(2019)

M. Dranik, V. Mesyura, Fake News Detection Using Naive Bayes Classifier. http://dx.doi.org/10.1109/UKRCON.2017.8100379....

TacchiniE. et al.

Automated fake news detection in social networks

(2017)

AhmadI. et al.

Fake News Detection using Machine Learning Ensemble Methods

(2020)

DouglasA.

News consumption and the new electronic media

Int. J. Press/Polit.

(2006)

Cited by (11)

A new sentence embedding framework for the education and professional training domain with application to hierarchical multi-label text classification
2024, Data and Knowledge Engineering
In recent years, Natural Language Processing (NLP) has made significant advances through advanced general language embeddings, allowing breakthroughs in NLP tasks such as semantic similarity and text classification. However, complexity increases with hierarchical multi-label classification (HMC), where a single entity can belong to several hierarchically organized classes. In such complex situations, applied on specific-domain texts, such as the Education and professional training domain, general language embedding models often inadequately represent the unique terminologies and contextual nuances of a specialized domain. To tackle this problem, we present HMCCCProbT, a novel hierarchical multi-label text classification approach. This innovative framework chains multiple classifiers, where each individual classifier is built using a novel sentence-embedding method BERTEPro based on existing Transformer models, whose pre-training has been extended on education and professional training texts, before being fine-tuned on several NLP tasks. Each individual classifier is responsible for the predictions of a given hierarchical level and propagates local probability predictions augmented with the input feature vectors to the classifier in charge of the subsequent level. HMCCCProbT tackles issues of model scalability and semantic interpretation, offering a powerful solution to the challenges of domain-specific hierarchical multi-label classification. Experiments over three domain-specific textual HMC datasets indicate the effectiveness of HMCCCProbT to compare favorably to state-of-the-art HMC algorithms in terms of classification accuracy and also the ability of BERTEPro to obtain better probability predictions, well suited to HMCCCProbT, than three other vector representation techniques.
Performance analysis of semantic veracity enhance (SVE) classifier for fake news detection and demystifying the online user behaviour in social media using sentiment analysis
2024, Social Network Analysis and Mining
Empirical Analysis on Fake News Detection Using Feature Extraction and Feature Optimization Techniques
2024, Lecture Notes in Networks and Systems
A Meta-Heuristic Based Approach for Optimal Fake News Detection Using Supervised Learning
2023, Research Square
Facilitating system-level behavioural climate action using computational social science
2023, Nature Human Behaviour
An Improved and Optimized Gated Recurrent Unit and Long Short-Term Memory Model for Fake News Detection
2023, International Journal of Communication Networks and Information Security

View all citing articles on Scopus

Pranav was born in 2000 in Bihar, India. He is completing his undergraduate B-Tech degree in the field of Information Technology from Dr. B C Roy Engineering College, Durgapur. His research interests include Cryptography, Artificial Intelligence, Cyber Security and Data Science.

Moumita Pradhan (Dr.) was born in West Bengal, India. She received her B. Tech in Computer Science and Engineering in 2005, M. Tech from NIT Durgapur in 2009. She completed her Ph.D. in Computer Science and Engineering in 2019 from NIT Durgapur, India. Her field of research is soft computing, swarm intelligence, Machine Learning, Artificial Intelligence, economic load dispatch, Hydrothermal scheduling.

View full text

Survey of fake news detection using machine intelligence approach

Abstract

Introduction

Section snippets

Related work

Evaluation

Results and discussion

Conclusions

Declaration of Competing Interest

Acknowledgments

Coronavirus fake news: how Facebook, Twitter, and Instagram are tackling the problem

Fake news detection on social media: a data mining perspective

A hybrid approach to fake news detection on social media

Niger. J. Technol.

Prominent features of rumor propagation in online social media

Automatic detection of fake news

The spread of true and false news online

Science

Corona virus (covid-19) infodemic and emerging issues through a data lens: the case of China

Int. J. Environ. Res. Public Health

Detecting fake political news online

Automated fake news detection in social networks

Fake News Detection using Machine Learning Ensemble Methods

News consumption and the new electronic media

Int. J. Press/Polit.