Semantic difference-based feature extraction technique for fake news detection

Gorai, Joy; Shaw, Dilip Kumar

doi:10.1007/s11227-024-06307-2

Semantic difference-based feature extraction technique for fake news detection

Published: 26 June 2024

Volume 80, pages 22631–22653, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Joy Gorai¹ &
Dilip Kumar Shaw¹

388 Accesses
Explore all metrics

Abstract

The rise of fake news presents a critical challenge to societal stability, emphasizing the urgent need for efficient detection systems. This study introduces an innovative approach to identifying fake news by exploiting the semantic discrepancies between the titles and content of news articles. Our method involves first summarizing article contents using a feature selection (FS) technique, followed by calculating vector distances between the titles and summarized contents using Cosine similarity, Jaccard distance and Euclidean distance. These methods enable us to identify multiple semantic dissimilarity characteristics, resulting in a more thorough examination. The obtained distance values are combined and used as features in our fake news detection models. These composite features are used to train various machine learning (ML) and deep learning (DL) models across three distinct news article datasets. Our approach achieves an accuracy rate of nearly 99.9%. The results underscore the effectiveness of leveraging semantic differences between article titles and content, offering a robust alternative to methods that focus solely on individual textual components. This technique not only improves accuracy but also provides a scalable solution to combating fake news in digital media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake News Detection Methods: A Survey and New Perspectives

Identification of Fake News: A Semantic Driven Technique for Transfer Domain

FN2: Fake News DetectioN Based on Textual and Contextual Features

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The author confirms that all data generated or analyzed during this study are included in this article.

Notes

https://www.teenvogue.com/story/the-best-tips-for-spotting-fake-news-in-the-age-of-trump.
https://www.nytimes.com/2016/11/25/world/europe/fake-news-donald-trump-hillary-clinton-georgia.html.

References

Munshi Arnav TK, Arvindhan M (2021) Random forest application of twitter data sentiment analysis in online social network prediction, . https://doi.org/10.1002/9781119792345.ch12
Rehman S, Huang Y, Tu S, Rehman O (2018) Facebook5k: A novel evaluation resource dataset for cross-media search. In: Cloud Computing and Security: 4th International Conference, ICCCS 2018, Haikou, China, June 8-10, 2018, Revised Selected Papers, Part I 4, pp. 512–524 . Springer
AM Hazim KSA (2020) Use of fake news and social media by main stream news channels of india. In: 2020 16th IEEE international colloquium on signal processing its applications (CSPA), pp. 93–97 . https://doi.org/10.1109/CSPA48992.2020.9068673
https://en.wikipedia.org/wiki/List_of_fake_news_websites . Accessed: 2023-01-23
Granik Mykhailo MV (2017) Fake news detection using naive bayes classifier. In: 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 900–903 . https://doi.org/10.1109/UKRCON.2017.8100379
Paramesha K, JOP, Gururaj HL (2021) Applications of Machine Learning in Biomedical Text Processing and Food Industry, pp. 151–167 . Chap. 10. https://doi.org/10.1002/9781119792611.ch10
Aytuǧ ONAN (2021) Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Comput Appl Eng Educ 29(3):572–589. https://doi.org/10.1002/cae.22253
Article Google Scholar
Conroy NK, Rubin VL, Yimin C (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4
Article Google Scholar
Wu Lianwei RY, Yu Hualei WY, Nazir A (2019) A multi-semantics classification method based on deep learning for incredible messages on social media. Chin J Electron 28(4):754–763. https://doi.org/10.1049/cje.2019.05.002
Article Google Scholar
Vani KG (2018) Deepa: Integrating syntax-semantic-based text analysis with structural and citation information for scientific plagiarism detection. J Am Soc Inf Sci 69(11):1330–1345. https://doi.org/10.1002/asi.24027
Article Google Scholar
Rehman S, Huang Y, Tu S, Ahmad B (2019) Learning a semantic space for modeling images, tags and feelings in cross-media search. In: Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2019 Workshops, BDM, DLKT, LDRC, PAISI, WeL, Macau, China, April 14–17, 2019, Revised Selected Papers 23, pp. 65–76 . Springer
Luo PH, Wang L, Jimin Wang SG, Xin Gao ZC, Wouk S (2022) Learning domain-specific semantic representation from weakly supervised data to improve research dataset retrieval. Proc Assoc Inf Sci Technol 59(1):205–214. https://doi.org/10.1002/pra2.616
Article Google Scholar
Saad SM, Kamarudin SS (2013) Comparative analysis of similarity measures for sentence level semantic measurement of text. In: 2013 IEEE International Conference on Control System, Computing and Engineering, pp. 90–94 . https://doi.org/10.1109/ICCSCE.2013.6719938
Bulbul Halil Ibrahim U (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 298–301 . https://doi.org/10.1109/ICMLA.2011.49
Yuhang Wang LW, Yanjie Yang TL (2021) Semseq4fd: Integrating global semantic relationship and local sequential order to enhance text representation for fake news detection. Expert Syst Appl 166:114090. https://doi.org/10.1016/j.eswa.2020.114090
Article Google Scholar
Junaed Younus Khan MTIK, Sadia Afroz GU, Iqbal A (2021) A benchmark study of machine learning models for online fake news detection. Mach Learn Appl 4:100032. https://doi.org/10.1016/j.mlwa.2021.100032
Article Google Scholar
Singh Vivek K, Ghosh Isha SD (2021) Detecting fake news stories via multimodal analysis. J Assoc Inf Sci Technol 72(1):3–17. https://doi.org/10.1002/asi.24359
Article Google Scholar
Ksieniewicz P, Zyblewski P, Borek-Marciniec W, Kozik R, Choraś M, Woźniak M (2023) Alphabet flatting as a variant of n-gram feature extraction method in ensemble classification of fake news. Eng Appl Artif Intell 120:105882. https://doi.org/10.1016/j.engappai.2023.105882
Article Google Scholar
Lima GB, Chaves TDM, Freitas WW, de Souza RM (2023) Statistical learning from Brazilian fake news. Expert Syst 40(3):13171. https://doi.org/10.1111/exsy.13171
Article Google Scholar
Kaliyar R, Goswami A, Narang P (2021) Deepfake: improving fake news detection using tensor decomposition-based deep neural network. J Supercomput. https://doi.org/10.1007/s11227-020-03294-y
Article Google Scholar
Qin S, Zhang M (2024) Boosting generalization of fine-tuning bert for fake news detection. Inf Process Manag 61(4):103745. https://doi.org/10.1016/j.ipm.2024.103745
Article Google Scholar
Zhao M, Zhang Y, Rao G (2024) Fake news detection based on dual-channel graph convolutional attention network. J Supercomput 80:1–22
Article Google Scholar
Khan J, Ahmad N, Khalid S, Ali F, Lee Y (2023) Sentiment and context-aware hybrid dnn with attention for text sentiment classification. IEEE Access 11:28162–28179. https://doi.org/10.1109/ACCESS.2023.3259107
Article Google Scholar
Zeng J, Zhou J, Huang C (2023) Exploring semantic relations for social media sentiment analysis. IEEE/ACM Trans Audio, Speech, Lang Process 31:2382–2394. https://doi.org/10.1109/TASLP.2023.3285238
Article Google Scholar
Wu D, Tan Z, Zhao H, Jiang T, Qi M (2024) Limfa: label-irrelevant multi-domain feature alignment-based fake news detection for unseen domain. Neural Comput Appl 36(10):5197–5215
Article Google Scholar
Torres-Moreno J-M (2014) 2. Automatic Text Summarization: Some Important Concepts, pp. 23–52 . https://doi.org/10.1002/9781119004752.ch2 . https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119004752.ch2
Guo Aizhang YT (2016) Research and improvement of feature words weight based on tfidf algorithm. In: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, pp. 415–419 . https://doi.org/10.1109/ITNEC.2016.7560393
Manning RP (2008) Christopher D: Introduction to Information Retrieval. Cambridge University Press, Cambridge
Google Scholar
Pradip Dhal CA (2021) A multi-objective feature selection method using newton’s law based pso with gwo. Appl Soft Comput 107:107394. https://doi.org/10.1016/j.asoc.2021.107394
Article Google Scholar
Mendez JR, Cotos-Yanez TR, Ruano-Ordas D (2019) A new semantic-based feature selection method for spam filtering. Appl Soft Comput 76:89–104. https://doi.org/10.1016/j.asoc.2018.12.008
Article Google Scholar
Alexander Kraskov PG, Stögbauer Harald (2004) Estimating mutual information. Phys Rev E. https://doi.org/10.1103/physreve.69.066138
Article MathSciNet Google Scholar
P Sunilkumar SAP (2019) A survey on semantic similarity. In: 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), pp. 1–8 . https://doi.org/10.1109/ICAC347590.2019.9036843
Ku Chih-Hao LG (2011) A crime reports analysis system to identify related crimes. J Am Soc Inform Sci Technol 62(8):1533–1547. https://doi.org/10.1002/asi.21552
Article Google Scholar
Lin H, Zhang P, Ling J, Yang Z, Lee LK, Liu W (2023) Ps-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis. Inf Process Manag 60(2):103229. https://doi.org/10.1016/j.ipm.2022.103229
Article Google Scholar
Ahmed Hadeer TI (2018) Sherif: detecting opinion spams and fake news using text classification. Secur Privacy 1(1):9. https://doi.org/10.1002/spy2.9
Article Google Scholar
Ahmed Hadeer TI (2017) Sherif: Detection of online fake news using n-gram analysis and machine learning techniques. In: Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, pp. 127–138
https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/index.php . Accessed: 2023-01-23
Elsaeed Eman OO, Elmogy Mohammed M, Atwan Ahmed E-D (2021) Eman: detecting fake news in social media using voting classifier. IEEE Access 9:161909–161925. https://doi.org/10.1109/ACCESS.2021.3132022
Article Google Scholar
Lin Qika ZY, Zhang Sifan GQ, Shi Pengfei Zhendong N (2019) Lexical based automated teaching evaluation via students’ short reviews. Comput Appl Eng Educ 27(1):194–205. https://doi.org/10.1002/cae.22068
Article Google Scholar
Thomas Anu SS (2021) Semi-supervised, knowledge-integrated pattern learning approach for fact extraction from judicial text. Expert Syst 38(3):12656. https://doi.org/10.1111/exsy.12656
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Jamshedpur, Jamshedpur, Jharkhand, 831014, India
Joy Gorai & Dilip Kumar Shaw

Authors

Joy Gorai
View author publications
You can also search for this author inPubMed Google Scholar
Dilip Kumar Shaw
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Joy Gorai contributed to writing-original draft preparation; Dilip Kumar Shaw was involved in supervision. All authors reviewed the manuscript.

Corresponding author

Correspondence to Joy Gorai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Availability of supporting data

The paper does not include any supporting data.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to publish

There is no content that requires permission of any third-party organizations or persons to publish the above manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gorai, J., Shaw, D.K. Semantic difference-based feature extraction technique for fake news detection. J Supercomput 80, 22631–22653 (2024). https://doi.org/10.1007/s11227-024-06307-2

Download citation

Accepted: 13 June 2024
Published: 26 June 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s11227-024-06307-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic difference-based feature extraction technique for fake news detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fake News Detection Methods: A Survey and New Perspectives

Identification of Fake News: A Semantic Driven Technique for Transfer Domain

FN2: Fake News DetectioN Based on Textual and Contextual Features

Explore related subjects

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Availability of supporting data

Ethical approval

Consent to publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now