Abstract
The rise of fake news presents a critical challenge to societal stability, emphasizing the urgent need for efficient detection systems. This study introduces an innovative approach to identifying fake news by exploiting the semantic discrepancies between the titles and content of news articles. Our method involves first summarizing article contents using a feature selection (FS) technique, followed by calculating vector distances between the titles and summarized contents using Cosine similarity, Jaccard distance and Euclidean distance. These methods enable us to identify multiple semantic dissimilarity characteristics, resulting in a more thorough examination. The obtained distance values are combined and used as features in our fake news detection models. These composite features are used to train various machine learning (ML) and deep learning (DL) models across three distinct news article datasets. Our approach achieves an accuracy rate of nearly 99.9%. The results underscore the effectiveness of leveraging semantic differences between article titles and content, offering a robust alternative to methods that focus solely on individual textual components. This technique not only improves accuracy but also provides a scalable solution to combating fake news in digital media.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The author confirms that all data generated or analyzed during this study are included in this article.
References
Munshi Arnav TK, Arvindhan M (2021) Random forest application of twitter data sentiment analysis in online social network prediction, . https://doi.org/10.1002/9781119792345.ch12
Rehman S, Huang Y, Tu S, Rehman O (2018) Facebook5k: A novel evaluation resource dataset for cross-media search. In: Cloud Computing and Security: 4th International Conference, ICCCS 2018, Haikou, China, June 8-10, 2018, Revised Selected Papers, Part I 4, pp. 512–524 . Springer
AM Hazim KSA (2020) Use of fake news and social media by main stream news channels of india. In: 2020 16th IEEE international colloquium on signal processing its applications (CSPA), pp. 93–97 . https://doi.org/10.1109/CSPA48992.2020.9068673
https://en.wikipedia.org/wiki/List_of_fake_news_websites . Accessed: 2023-01-23
Granik Mykhailo MV (2017) Fake news detection using naive bayes classifier. In: 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 900–903 . https://doi.org/10.1109/UKRCON.2017.8100379
Paramesha K, JOP, Gururaj HL (2021) Applications of Machine Learning in Biomedical Text Processing and Food Industry, pp. 151–167 . Chap. 10. https://doi.org/10.1002/9781119792611.ch10
Aytuǧ ONAN (2021) Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Comput Appl Eng Educ 29(3):572–589. https://doi.org/10.1002/cae.22253
Conroy NK, Rubin VL, Yimin C (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4
Wu Lianwei RY, Yu Hualei WY, Nazir A (2019) A multi-semantics classification method based on deep learning for incredible messages on social media. Chin J Electron 28(4):754–763. https://doi.org/10.1049/cje.2019.05.002
Vani KG (2018) Deepa: Integrating syntax-semantic-based text analysis with structural and citation information for scientific plagiarism detection. J Am Soc Inf Sci 69(11):1330–1345. https://doi.org/10.1002/asi.24027
Rehman S, Huang Y, Tu S, Ahmad B (2019) Learning a semantic space for modeling images, tags and feelings in cross-media search. In: Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2019 Workshops, BDM, DLKT, LDRC, PAISI, WeL, Macau, China, April 14–17, 2019, Revised Selected Papers 23, pp. 65–76 . Springer
Luo PH, Wang L, Jimin Wang SG, Xin Gao ZC, Wouk S (2022) Learning domain-specific semantic representation from weakly supervised data to improve research dataset retrieval. Proc Assoc Inf Sci Technol 59(1):205–214. https://doi.org/10.1002/pra2.616
Saad SM, Kamarudin SS (2013) Comparative analysis of similarity measures for sentence level semantic measurement of text. In: 2013 IEEE International Conference on Control System, Computing and Engineering, pp. 90–94 . https://doi.org/10.1109/ICCSCE.2013.6719938
Bulbul Halil Ibrahim U (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 298–301 . https://doi.org/10.1109/ICMLA.2011.49
Yuhang Wang LW, Yanjie Yang TL (2021) Semseq4fd: Integrating global semantic relationship and local sequential order to enhance text representation for fake news detection. Expert Syst Appl 166:114090. https://doi.org/10.1016/j.eswa.2020.114090
Junaed Younus Khan MTIK, Sadia Afroz GU, Iqbal A (2021) A benchmark study of machine learning models for online fake news detection. Mach Learn Appl 4:100032. https://doi.org/10.1016/j.mlwa.2021.100032
Singh Vivek K, Ghosh Isha SD (2021) Detecting fake news stories via multimodal analysis. J Assoc Inf Sci Technol 72(1):3–17. https://doi.org/10.1002/asi.24359
Ksieniewicz P, Zyblewski P, Borek-Marciniec W, Kozik R, Choraś M, Woźniak M (2023) Alphabet flatting as a variant of n-gram feature extraction method in ensemble classification of fake news. Eng Appl Artif Intell 120:105882. https://doi.org/10.1016/j.engappai.2023.105882
Lima GB, Chaves TDM, Freitas WW, de Souza RM (2023) Statistical learning from Brazilian fake news. Expert Syst 40(3):13171. https://doi.org/10.1111/exsy.13171
Kaliyar R, Goswami A, Narang P (2021) Deepfake: improving fake news detection using tensor decomposition-based deep neural network. J Supercomput. https://doi.org/10.1007/s11227-020-03294-y
Qin S, Zhang M (2024) Boosting generalization of fine-tuning bert for fake news detection. Inf Process Manag 61(4):103745. https://doi.org/10.1016/j.ipm.2024.103745
Zhao M, Zhang Y, Rao G (2024) Fake news detection based on dual-channel graph convolutional attention network. J Supercomput 80:1–22
Khan J, Ahmad N, Khalid S, Ali F, Lee Y (2023) Sentiment and context-aware hybrid dnn with attention for text sentiment classification. IEEE Access 11:28162–28179. https://doi.org/10.1109/ACCESS.2023.3259107
Zeng J, Zhou J, Huang C (2023) Exploring semantic relations for social media sentiment analysis. IEEE/ACM Trans Audio, Speech, Lang Process 31:2382–2394. https://doi.org/10.1109/TASLP.2023.3285238
Wu D, Tan Z, Zhao H, Jiang T, Qi M (2024) Limfa: label-irrelevant multi-domain feature alignment-based fake news detection for unseen domain. Neural Comput Appl 36(10):5197–5215
Torres-Moreno J-M (2014) 2. Automatic Text Summarization: Some Important Concepts, pp. 23–52 . https://doi.org/10.1002/9781119004752.ch2 . https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119004752.ch2
Guo Aizhang YT (2016) Research and improvement of feature words weight based on tfidf algorithm. In: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, pp. 415–419 . https://doi.org/10.1109/ITNEC.2016.7560393
Manning RP (2008) Christopher D: Introduction to Information Retrieval. Cambridge University Press, Cambridge
Pradip Dhal CA (2021) A multi-objective feature selection method using newton’s law based pso with gwo. Appl Soft Comput 107:107394. https://doi.org/10.1016/j.asoc.2021.107394
Mendez JR, Cotos-Yanez TR, Ruano-Ordas D (2019) A new semantic-based feature selection method for spam filtering. Appl Soft Comput 76:89–104. https://doi.org/10.1016/j.asoc.2018.12.008
Alexander Kraskov PG, Stögbauer Harald (2004) Estimating mutual information. Phys Rev E. https://doi.org/10.1103/physreve.69.066138
P Sunilkumar SAP (2019) A survey on semantic similarity. In: 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), pp. 1–8 . https://doi.org/10.1109/ICAC347590.2019.9036843
Ku Chih-Hao LG (2011) A crime reports analysis system to identify related crimes. J Am Soc Inform Sci Technol 62(8):1533–1547. https://doi.org/10.1002/asi.21552
Lin H, Zhang P, Ling J, Yang Z, Lee LK, Liu W (2023) Ps-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis. Inf Process Manag 60(2):103229. https://doi.org/10.1016/j.ipm.2022.103229
Ahmed Hadeer TI (2018) Sherif: detecting opinion spams and fake news using text classification. Secur Privacy 1(1):9. https://doi.org/10.1002/spy2.9
Ahmed Hadeer TI (2017) Sherif: Detection of online fake news using n-gram analysis and machine learning techniques. In: Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, pp. 127–138
https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/index.php . Accessed: 2023-01-23
Elsaeed Eman OO, Elmogy Mohammed M, Atwan Ahmed E-D (2021) Eman: detecting fake news in social media using voting classifier. IEEE Access 9:161909–161925. https://doi.org/10.1109/ACCESS.2021.3132022
Lin Qika ZY, Zhang Sifan GQ, Shi Pengfei Zhendong N (2019) Lexical based automated teaching evaluation via students’ short reviews. Comput Appl Eng Educ 27(1):194–205. https://doi.org/10.1002/cae.22068
Thomas Anu SS (2021) Semi-supervised, knowledge-integrated pattern learning approach for fact extraction from judicial text. Expert Syst 38(3):12656. https://doi.org/10.1111/exsy.12656
Acknowledgements
Not applicable.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
Joy Gorai contributed to writing-original draft preparation; Dilip Kumar Shaw was involved in supervision. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Availability of supporting data
The paper does not include any supporting data.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to publish
There is no content that requires permission of any third-party organizations or persons to publish the above manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gorai, J., Shaw, D.K. Semantic difference-based feature extraction technique for fake news detection. J Supercomput 80, 22631–22653 (2024). https://doi.org/10.1007/s11227-024-06307-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06307-2