Skip to main content

Advertisement

Log in

A hybrid model for the detection of multi-agent written news articles based on linguistic features and BERT

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Large language models (LLMs) are central to AI systems and Excel in natural language processing tasks. They blur the line between human and machine-generated text and are widely used by professional writers across domains including news article generation. The challenge of detecting LLM-written articles introduces novel obstacles regarding misuse and the generation of fake content. In this work, we aim to recognize two kinds of LLM-written news where one type is entirely generated by LLMs and another is paraphrased based on existing news sources. We propose a neural network model that incorporates linguistic features and BERT contextual embedding features for LLM-written news article detection. In conjunction with the proposed model, we also produce a news article corpus based on the BBC dataset to generate and paraphrase news articles through multi-agent cooperation using ChatGPT. Our model obtains 96.57% accuracy and 96.44% F1macro score, respectively, outperforming other existing models and indicating the capability of helping readers to identify LLM-written news articles. To assess the model’s robustness, we also construct another corpus based on the BBC dataset using a different language model, Claude, and demonstrate that our detection model achieves strong results. Furthermore, we apply our model to text generation detection in the medical domain, where it also delivers promising performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Pelau C, Dabija DC, Ene I (2021) What makes an AI device human-like? The role of interaction quality, empathy and perceived psychological anthropomorphic characteristics in the acceptance of artificial intelligence in the service industry. Comput Hum Behav 122:106855

    Article  Google Scholar 

  2. Roumeliotis KI, Tselikas ND (2023) Chatgpt and open-ai models: a preliminary review. Future Internet 15(6):192

    Article  MATH  Google Scholar 

  3. Xiao C, Xu SX, Zhang K, Wang Y, Xia L (2023, July) Evaluating reading comprehension exercises generated by LLMs: a showcase of ChatGPT in education applications. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 610–625)

  4. Bishop L (2023) A computer wrote this paper: What ChatGPT means for education, research, and writing. research, and writing (January 26, 2023)

  5. Lo CK (2023) What is the impact of ChatGPT on education? A rapid review of the literature. Educ Sci 13(4):410

    Article  MATH  Google Scholar 

  6. Shahriar S, Hayawi K (2024) Let’s have a chat! A conversation with ChatGPT: technology, applications, and limitations. In Artificial Intelligence and Applications (Vol. 2, No. 1, pp. 11–20)

  7. Abdalla MHI, Malberg S, Dementieva D, Mosca E, Groh G (2023) A benchmark dataset to distinguish human-written and machine-generated scientific papers. Information 14(10):522

    Article  Google Scholar 

  8. Liu Y, Yao Y, Ton JF, Zhang X, Cheng RGH, Klochkov Y, Li H (2023) Trustworthy LLMs: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374

  9. Mireshghallah N, Kim H, Zhou X, Tsvetkov Y, Sap M, Shokri R, Choi Y (2023) Can llms keep a secret? Testing privacy implications of language models via contextual integrity theory. arXiv preprint arXiv:2310.17884

  10. Sullivan M, Kelly A, McLaughlan P (2023) ChatGPT in higher education: considerations for academic integrity and student learning

  11. Gill SS, Xu M, Patros P, Wu H, Kaur R, Kaur K, Buyya R (2024) Transformative effects of ChatGPT on modern education: emerging Era of AI Chatbots. Internet Things Cyber-Phys Syst 4:19–23

    Article  MATH  Google Scholar 

  12. Li X, Zhang Y, Malthouse EC (2023) A preliminary study of chatgpt on news recommendation: Personalization, provider fairness, fake news. arXiv preprint arXiv:2306.10702

  13. Wang Z, Cheng J, Cui C, Yu C (2023) Implementing BERT and fine-tuned RobertA to detect AI generated news by ChatGPT. arXiv preprint arXiv:2306.07401

  14. Dalalah D, Dalalah OM (2023) The false positives and false negatives of generative AI detection tools in education and academic research: the case of ChatGPT. Int J Manag Educ 21(2):100822

    MATH  Google Scholar 

  15. Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Foltýnek T, Guerrero-Dib J, Popoola O, Waddington L (2023) Testing of detection tools for AI-generated text. Int J Educ Integr 19(1):26

    Article  Google Scholar 

  16. Zellers R, Holtzman A, Rashkin H, Bisk Y, Farhadi A, Roesner F, Choi Y (2019) Defending against neural fake news. Advances in neural information processing systems, 32

  17. Maronikolakis A, Schutze H, Stevenson M (2020) Identifying automatically generated headlines using transformers. arXiv preprint arXiv:2009.13375

  18. Huang Y, Sun L (2024) FakeGPT: fake news generation, explanation and detection of large language models. arXiv preprint

  19. Heppell F, Bakir ME, Bontcheva K (2024) Lying Blindly: bypassing ChatGPT’s safeguards to generate hard-to-detect disinformation claims at scale. arXiv preprint arXiv:2402.08467

  20. Xu H, Ren J, He P, Zeng S, Cui Y, Liu A, Tang J (2023) On the generalization of training-based chatgpt detection methods. arXiv preprint arXiv:2310.01307

  21. Wu T, He S, Liu J, Sun S, Liu K, Han QL, Tang Y (2023) A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA J Autom Sinica 10(5):1122–1136

    Article  MATH  Google Scholar 

  22. Greene D, Cunningham P (2006, June) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on MACHINE Learning (pp. 377–384)

  23. Anthropic (2024, March 4) Introducing the next generation of Claude. url: https://www.anthropic.com/news/claude-3-family, retrieved Dec 10, 2024

  24. Zeng G, Yang W, Ju Z, Yang Y, Wang S, Zhang R, Xie P (2020, November) MedDialog: Large-scale medical dialogue datasets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 9241–9250)

  25. Yazdani SF, Murad MAA, Sharef NM, Singh YP, Latiff ARA (2017) Sentiment classification of financial news using statistical features. Int J Pattern Recognit Artif Intell 31(03):1750006

    Article  Google Scholar 

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30

  27. Minaee S, Mikolov T, Nikzad N, Chenaghlu M, Socher R, Amatriain X, Gao J (2024) Large language models: A survey. arXiv preprint arXiv:2402.06196

  28. Kalyan KS (2023) A survey of GPT-3 family large language models including ChatGPT and GPT-4. Natural Language Processing Journal, 100048

  29. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health 2(2):e0000198

    Article  Google Scholar 

  30. Rudolph J, Tan S, Tan S (2023) ChatGPT: bullshit spewer or the end of traditional assessments in higher education? J Appl Learn Teach 6(1):342–363

    MATH  Google Scholar 

  31. Cao Y, Zhai J (2023) Bridging the gap–the impact of ChatGPT on financial research. J Chin Econ Bus Stud 21(2):177–191

    Article  MATH  Google Scholar 

  32. Guo C, Lu Y, Dou Y, Wang FY (2023) Can ChatGPT boost artistic creation: the need of imaginative intelligence for parallel art. IEEE/CAA J Autom Sinica 10(4):835–838

    Article  Google Scholar 

  33. Zhang T, Patil SG, Jain N, Shen S, Zaharia M, Stoica I, Gonzalez JE (2024) RAFT: Adapting Language Model to Domain Specific RAG. arXiv preprint arXiv:2403.10131

  34. Zhou C, Liu P, Xu P, Iyer S, Sun J, Mao Y, Levy O (2024) Lima: less is more for alignment. Advances in Neural Information Processing Systems, 36

  35. Guu K, Lee K, Tung Z, Pasupat P, Chang M (2020, November) Retrieval augmented language model pre-training. In: International Conference on Machine Learning (pp. 3929–3938). PMLR

  36. Asai A, Wu Z, Wang Y, Sil A, Hajishirzi H (2023) Self-rag: learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511

  37. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 6(35):27730–27744

    Google Scholar 

  38. Zhang H, Chen J, Jiang F, Yu F, Chen Z, Li J, Li H (2023) Huatuogpt, towards taming language model to be a doctor. arXiv preprint arXiv:2305.15075

  39. Hu X, Chen PY, Ho TY (2023) Radar: robust ai-text detection via adversarial learning. Adv Neural Inf Process Syst 36:15077–15095

    Google Scholar 

  40. Verma V, Fleisig E, Tomlin N, Klein D (2023) Ghostbuster: detecting text ghostwritten by large language models. arXiv preprint arXiv:2305.15047

  41. Yang X, Pan L, Zhao X, Chen H, Petzold L, Wang WY, Cheng W (2023) A survey on detection of llms-generated content. arXiv preprint arXiv:2310.15654

  42. Chen Y, Kang H, Zhai V, Li L, Singh R, Ramakrishnan B (2023) Gpt-sentinel: distinguishing human and chatgpt generated content. arXiv preprint arXiv:2305.07969

  43. Wu K, Pang L, Shen H, Cheng X, Chua TS (2023) Llmdet: a large language models detection tool. arXiv preprint arXiv:2305.15004

  44. Wang LZ, Ma Y, Gao R, Guo B, Zhu H, Fan W, Ng KC (2024) Megafake: a theory-driven dataset of fake news generated by large language models. arXiv preprint arXiv:2408.11871

  45. Lavergne T, Urvoy T, Yvon F (2008, July) Detecting fake content with relative entropy scoring. In: Proceedings of the 2008 International Conference on Uncovering Plagiarism, Authorship and Social Software Misuse-Volume 377 (pp. 27–31)

  46. Yang X, Cheng W, Petzold L, Wang, WY, Chen H (2023) Dna-gpt: Divergent n-gram analysis for training-free detection of gpt-generated text. arXiv preprint arXiv:2305.17359

  47. Krishna K, Song Y, Karpinska M, Wieting J, Iyyer M (2024) Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. Advances in Neural Information Processing Systems, 36

  48. Mitchell E, Lee Y, Khazatsky A, Manning CD, Finn C (2023, July) Detectgpt: Zero-shot machine-generated text detection using probability curvature. In: International Conference on Machine Learning (pp. 24950–24962). PMLR

  49. Choudhry A, Khatri I, Jain M, Vishwakarma DK (2022) An emotion-aware multitask approach to fake news and rumor detection using transfer learning. IEEE Trans Comput Soc Syst 11(1):588–599

    Article  Google Scholar 

  50. Cavalcante AAB, Freire PMS, Goldschmidt RR, Justel CM (2024) Early detection of fake news on virtual social networks: a time-aware approach based on crowd signals. Expert Syst Appl 247:123350

    Article  Google Scholar 

  51. Karaoğlan KM (2024) Novel approaches for fake news detection based on attention-based deep multiple-instance learning using contextualized neural language models. Neurocomputing 602:128263

    Article  MATH  Google Scholar 

  52. White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, Schmidt DC (2023) A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382

  53. Dey RK, Das AK (2023) Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimed Tools Appl 82(21):32967–32990

    Article  MATH  Google Scholar 

  54. Mindner L, Schlippe T, Schaaff K (2023, June) Classification of human-and ai-generated texts: Investigating features for Chatgpt. In: International Conference on Artificial Intelligence in Education Technology (pp. 152–170). Singapore: Springer Nature Singapore

  55. Phani S, Lahiri S, Biswas A (2016, December) Sentiment analysis of tweets in three Indian languages. In: Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016) (pp. 93–102)

  56. Chapman AB, Peterson KS, Alba PR, DuVall SL, Patterson OV (2019) Detecting adverse drug events with rapidly trained classification models. Drug Saf 42:147–156

    Article  Google Scholar 

  57. Corizzo R, Leal-Arenas S (2023, December) One-GPT: a one-class deep fusion model for machine-generated text detection. In: 2023 IEEE International Conference on Big Data (BigData) (pp. 5743–5752). IEEE

  58. Nguyen TT, Hatua A, Sung AH (2023, October) How to detect AI-generated texts?. In: 2023 IEEE 14th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0464–0471). IEEE

  59. Zhou S, Jeong H, Green PA (2017) How consistent are the best-known readability equations in estimating the readability of design standards? IEEE Trans Prof Commun 60(1):97–111

    Article  MATH  Google Scholar 

  60. Guo B, Zhang X, Wang Z, Jiang M, Nie J, Ding Y, Wu Y (2023) How close is chatgpt to human experts? Comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597

  61. Corizzo R, Leal-Arenas S (2023) One-class learning for ai-generated essay detection. Appl Sci 13(13):7901

    Article  MATH  Google Scholar 

  62. Holtzman A, Buys J, Du L, Forbes M, Choi Y (2019) The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751

  63. Kettunen K (2014) Can type-token ratio be used to show morphological complexity of languages? J Quant Linguist 21(3):223–245

    Article  MATH  Google Scholar 

  64. Montemurro MA, Zanette DH (2002) Entropic analysis of the role of words in literary texts. Adv Complex Syst 5(01):7–17

    Article  MATH  Google Scholar 

  65. Gargiulo F, Silvestri S, Ciampi M, De Pietro G (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138

    Article  Google Scholar 

  66. Bhattacharjee A, Liu H (2024) Fighting fire with fire: can ChatGPT detect AI-generated text? ACM SIGKDD Explor Newsl 25(2):14–21

    Article  MATH  Google Scholar 

  67. Wang R, Chen H, Zhou R, Ma H, Duan Y, Kang Y, Tan T (2024) LLM-detector: improving AI-generated chinese text detection with open-source LLM instruction tuning. arXiv preprint arXiv:2402.01158

  68. Steponenaite A, Barakat, B (2023, July) Plagiarism in AI empowered world. In: International Conference on Human-Computer Interaction (pp. 434–442). Cham: Springer Nature Switzerland

Download references

Funding

This work was partially supported by the National Science and Technology Council (NSTC), Taiwan, under Grants Number 112–2622-E-029 -009.

Author information

Authors and Affiliations

Authors

Contributions

CS Lin contributed to concept development, methodology, investigation, data collection, experiment design and writing.

Corresponding author

Correspondence to Ching-Sheng Lin.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, CS. A hybrid model for the detection of multi-agent written news articles based on linguistic features and BERT. J Supercomput 81, 381 (2025). https://doi.org/10.1007/s11227-024-06882-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06882-4

Keywords