Skip to main content

Advertisement

Log in

Semantic difference-based feature extraction technique for fake news detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The rise of fake news presents a critical challenge to societal stability, emphasizing the urgent need for efficient detection systems. This study introduces an innovative approach to identifying fake news by exploiting the semantic discrepancies between the titles and content of news articles. Our method involves first summarizing article contents using a feature selection (FS) technique, followed by calculating vector distances between the titles and summarized contents using Cosine similarity, Jaccard distance and Euclidean distance. These methods enable us to identify multiple semantic dissimilarity characteristics, resulting in a more thorough examination. The obtained distance values are combined and used as features in our fake news detection models. These composite features are used to train various machine learning (ML) and deep learning (DL) models across three distinct news article datasets. Our approach achieves an accuracy rate of nearly 99.9%. The results underscore the effectiveness of leveraging semantic differences between article titles and content, offering a robust alternative to methods that focus solely on individual textual components. This technique not only improves accuracy but also provides a scalable solution to combating fake news in digital media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The author confirms that all data generated or analyzed during this study are included in this article.

Notes

  1. https://www.teenvogue.com/story/the-best-tips-for-spotting-fake-news-in-the-age-of-trump.

    https://www.nytimes.com/2016/11/25/world/europe/fake-news-donald-trump-hillary-clinton-georgia.html.

References

  1. Munshi Arnav TK, Arvindhan M (2021) Random forest application of twitter data sentiment analysis in online social network prediction, . https://doi.org/10.1002/9781119792345.ch12

  2. Rehman S, Huang Y, Tu S, Rehman O (2018) Facebook5k: A novel evaluation resource dataset for cross-media search. In: Cloud Computing and Security: 4th International Conference, ICCCS 2018, Haikou, China, June 8-10, 2018, Revised Selected Papers, Part I 4, pp. 512–524 . Springer

  3. AM Hazim KSA (2020) Use of fake news and social media by main stream news channels of india. In: 2020 16th IEEE international colloquium on signal processing its applications (CSPA), pp. 93–97 . https://doi.org/10.1109/CSPA48992.2020.9068673

  4. https://en.wikipedia.org/wiki/List_of_fake_news_websites . Accessed: 2023-01-23

  5. Granik Mykhailo MV (2017) Fake news detection using naive bayes classifier. In: 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 900–903 . https://doi.org/10.1109/UKRCON.2017.8100379

  6. Paramesha K, JOP, Gururaj HL (2021) Applications of Machine Learning in Biomedical Text Processing and Food Industry, pp. 151–167 . Chap. 10. https://doi.org/10.1002/9781119792611.ch10

  7. Aytuǧ ONAN (2021) Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Comput Appl Eng Educ 29(3):572–589. https://doi.org/10.1002/cae.22253

    Article  Google Scholar 

  8. Conroy NK, Rubin VL, Yimin C (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4

    Article  Google Scholar 

  9. Wu Lianwei RY, Yu Hualei WY, Nazir A (2019) A multi-semantics classification method based on deep learning for incredible messages on social media. Chin J Electron 28(4):754–763. https://doi.org/10.1049/cje.2019.05.002

    Article  Google Scholar 

  10. Vani KG (2018) Deepa: Integrating syntax-semantic-based text analysis with structural and citation information for scientific plagiarism detection. J Am Soc Inf Sci 69(11):1330–1345. https://doi.org/10.1002/asi.24027

    Article  Google Scholar 

  11. Rehman S, Huang Y, Tu S, Ahmad B (2019) Learning a semantic space for modeling images, tags and feelings in cross-media search. In: Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2019 Workshops, BDM, DLKT, LDRC, PAISI, WeL, Macau, China, April 14–17, 2019, Revised Selected Papers 23, pp. 65–76 . Springer

  12. Luo PH, Wang L, Jimin Wang SG, Xin Gao ZC, Wouk S (2022) Learning domain-specific semantic representation from weakly supervised data to improve research dataset retrieval. Proc Assoc Inf Sci Technol 59(1):205–214. https://doi.org/10.1002/pra2.616

    Article  Google Scholar 

  13. Saad SM, Kamarudin SS (2013) Comparative analysis of similarity measures for sentence level semantic measurement of text. In: 2013 IEEE International Conference on Control System, Computing and Engineering, pp. 90–94 . https://doi.org/10.1109/ICCSCE.2013.6719938

  14. Bulbul Halil Ibrahim U (2011) Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 298–301 . https://doi.org/10.1109/ICMLA.2011.49

  15. Yuhang Wang LW, Yanjie Yang TL (2021) Semseq4fd: Integrating global semantic relationship and local sequential order to enhance text representation for fake news detection. Expert Syst Appl 166:114090. https://doi.org/10.1016/j.eswa.2020.114090

    Article  Google Scholar 

  16. Junaed Younus Khan MTIK, Sadia Afroz GU, Iqbal A (2021) A benchmark study of machine learning models for online fake news detection. Mach Learn Appl 4:100032. https://doi.org/10.1016/j.mlwa.2021.100032

    Article  Google Scholar 

  17. Singh Vivek K, Ghosh Isha SD (2021) Detecting fake news stories via multimodal analysis. J Assoc Inf Sci Technol 72(1):3–17. https://doi.org/10.1002/asi.24359

    Article  Google Scholar 

  18. Ksieniewicz P, Zyblewski P, Borek-Marciniec W, Kozik R, Choraś M, Woźniak M (2023) Alphabet flatting as a variant of n-gram feature extraction method in ensemble classification of fake news. Eng Appl Artif Intell 120:105882. https://doi.org/10.1016/j.engappai.2023.105882

    Article  Google Scholar 

  19. Lima GB, Chaves TDM, Freitas WW, de Souza RM (2023) Statistical learning from Brazilian fake news. Expert Syst 40(3):13171. https://doi.org/10.1111/exsy.13171

    Article  Google Scholar 

  20. Kaliyar R, Goswami A, Narang P (2021) Deepfake: improving fake news detection using tensor decomposition-based deep neural network. J Supercomput. https://doi.org/10.1007/s11227-020-03294-y

    Article  Google Scholar 

  21. Qin S, Zhang M (2024) Boosting generalization of fine-tuning bert for fake news detection. Inf Process Manag 61(4):103745. https://doi.org/10.1016/j.ipm.2024.103745

    Article  Google Scholar 

  22. Zhao M, Zhang Y, Rao G (2024) Fake news detection based on dual-channel graph convolutional attention network. J Supercomput 80:1–22

    Article  Google Scholar 

  23. Khan J, Ahmad N, Khalid S, Ali F, Lee Y (2023) Sentiment and context-aware hybrid dnn with attention for text sentiment classification. IEEE Access 11:28162–28179. https://doi.org/10.1109/ACCESS.2023.3259107

    Article  Google Scholar 

  24. Zeng J, Zhou J, Huang C (2023) Exploring semantic relations for social media sentiment analysis. IEEE/ACM Trans Audio, Speech, Lang Process 31:2382–2394. https://doi.org/10.1109/TASLP.2023.3285238

    Article  Google Scholar 

  25. Wu D, Tan Z, Zhao H, Jiang T, Qi M (2024) Limfa: label-irrelevant multi-domain feature alignment-based fake news detection for unseen domain. Neural Comput Appl 36(10):5197–5215

    Article  Google Scholar 

  26. Torres-Moreno J-M (2014) 2. Automatic Text Summarization: Some Important Concepts, pp. 23–52 . https://doi.org/10.1002/9781119004752.ch2 . https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119004752.ch2

  27. Guo Aizhang YT (2016) Research and improvement of feature words weight based on tfidf algorithm. In: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, pp. 415–419 . https://doi.org/10.1109/ITNEC.2016.7560393

  28. Manning RP (2008) Christopher D: Introduction to Information Retrieval. Cambridge University Press, Cambridge

    Google Scholar 

  29. Pradip Dhal CA (2021) A multi-objective feature selection method using newton’s law based pso with gwo. Appl Soft Comput 107:107394. https://doi.org/10.1016/j.asoc.2021.107394

    Article  Google Scholar 

  30. Mendez JR, Cotos-Yanez TR, Ruano-Ordas D (2019) A new semantic-based feature selection method for spam filtering. Appl Soft Comput 76:89–104. https://doi.org/10.1016/j.asoc.2018.12.008

    Article  Google Scholar 

  31. Alexander Kraskov PG, Stögbauer Harald (2004) Estimating mutual information. Phys Rev E. https://doi.org/10.1103/physreve.69.066138

    Article  MathSciNet  Google Scholar 

  32. P Sunilkumar SAP (2019) A survey on semantic similarity. In: 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), pp. 1–8 . https://doi.org/10.1109/ICAC347590.2019.9036843

  33. Ku Chih-Hao LG (2011) A crime reports analysis system to identify related crimes. J Am Soc Inform Sci Technol 62(8):1533–1547. https://doi.org/10.1002/asi.21552

    Article  Google Scholar 

  34. Lin H, Zhang P, Ling J, Yang Z, Lee LK, Liu W (2023) Ps-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis. Inf Process Manag 60(2):103229. https://doi.org/10.1016/j.ipm.2022.103229

    Article  Google Scholar 

  35. Ahmed Hadeer TI (2018) Sherif: detecting opinion spams and fake news using text classification. Secur Privacy 1(1):9. https://doi.org/10.1002/spy2.9

    Article  Google Scholar 

  36. Ahmed Hadeer TI (2017) Sherif: Detection of online fake news using n-gram analysis and machine learning techniques. In: Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, pp. 127–138

  37. https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/index.php . Accessed: 2023-01-23

  38. Elsaeed Eman OO, Elmogy Mohammed M, Atwan Ahmed E-D (2021) Eman: detecting fake news in social media using voting classifier. IEEE Access 9:161909–161925. https://doi.org/10.1109/ACCESS.2021.3132022

    Article  Google Scholar 

  39. Lin Qika ZY, Zhang Sifan GQ, Shi Pengfei Zhendong N (2019) Lexical based automated teaching evaluation via students’ short reviews. Comput Appl Eng Educ 27(1):194–205. https://doi.org/10.1002/cae.22068

    Article  Google Scholar 

  40. Thomas Anu SS (2021) Semi-supervised, knowledge-integrated pattern learning approach for fact extraction from judicial text. Expert Syst 38(3):12656. https://doi.org/10.1111/exsy.12656

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

Joy Gorai contributed to writing-original draft preparation; Dilip Kumar Shaw was involved in supervision. All authors reviewed the manuscript.

Corresponding author

Correspondence to Joy Gorai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Availability of supporting data

The paper does not include any supporting data.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to publish

There is no content that requires permission of any third-party organizations or persons to publish the above manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gorai, J., Shaw, D.K. Semantic difference-based feature extraction technique for fake news detection. J Supercomput 80, 22631–22653 (2024). https://doi.org/10.1007/s11227-024-06307-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06307-2

Keywords