Abstract
Large language models (LLMs) are central to AI systems and Excel in natural language processing tasks. They blur the line between human and machine-generated text and are widely used by professional writers across domains including news article generation. The challenge of detecting LLM-written articles introduces novel obstacles regarding misuse and the generation of fake content. In this work, we aim to recognize two kinds of LLM-written news where one type is entirely generated by LLMs and another is paraphrased based on existing news sources. We propose a neural network model that incorporates linguistic features and BERT contextual embedding features for LLM-written news article detection. In conjunction with the proposed model, we also produce a news article corpus based on the BBC dataset to generate and paraphrase news articles through multi-agent cooperation using ChatGPT. Our model obtains 96.57% accuracy and 96.44% F1macro score, respectively, outperforming other existing models and indicating the capability of helping readers to identify LLM-written news articles. To assess the model’s robustness, we also construct another corpus based on the BBC dataset using a different language model, Claude, and demonstrate that our detection model achieves strong results. Furthermore, we apply our model to text generation detection in the medical domain, where it also delivers promising performance.



Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
References
Pelau C, Dabija DC, Ene I (2021) What makes an AI device human-like? The role of interaction quality, empathy and perceived psychological anthropomorphic characteristics in the acceptance of artificial intelligence in the service industry. Comput Hum Behav 122:106855
Roumeliotis KI, Tselikas ND (2023) Chatgpt and open-ai models: a preliminary review. Future Internet 15(6):192
Xiao C, Xu SX, Zhang K, Wang Y, Xia L (2023, July) Evaluating reading comprehension exercises generated by LLMs: a showcase of ChatGPT in education applications. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 610–625)
Bishop L (2023) A computer wrote this paper: What ChatGPT means for education, research, and writing. research, and writing (January 26, 2023)
Lo CK (2023) What is the impact of ChatGPT on education? A rapid review of the literature. Educ Sci 13(4):410
Shahriar S, Hayawi K (2024) Let’s have a chat! A conversation with ChatGPT: technology, applications, and limitations. In Artificial Intelligence and Applications (Vol. 2, No. 1, pp. 11–20)
Abdalla MHI, Malberg S, Dementieva D, Mosca E, Groh G (2023) A benchmark dataset to distinguish human-written and machine-generated scientific papers. Information 14(10):522
Liu Y, Yao Y, Ton JF, Zhang X, Cheng RGH, Klochkov Y, Li H (2023) Trustworthy LLMs: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374
Mireshghallah N, Kim H, Zhou X, Tsvetkov Y, Sap M, Shokri R, Choi Y (2023) Can llms keep a secret? Testing privacy implications of language models via contextual integrity theory. arXiv preprint arXiv:2310.17884
Sullivan M, Kelly A, McLaughlan P (2023) ChatGPT in higher education: considerations for academic integrity and student learning
Gill SS, Xu M, Patros P, Wu H, Kaur R, Kaur K, Buyya R (2024) Transformative effects of ChatGPT on modern education: emerging Era of AI Chatbots. Internet Things Cyber-Phys Syst 4:19–23
Li X, Zhang Y, Malthouse EC (2023) A preliminary study of chatgpt on news recommendation: Personalization, provider fairness, fake news. arXiv preprint arXiv:2306.10702
Wang Z, Cheng J, Cui C, Yu C (2023) Implementing BERT and fine-tuned RobertA to detect AI generated news by ChatGPT. arXiv preprint arXiv:2306.07401
Dalalah D, Dalalah OM (2023) The false positives and false negatives of generative AI detection tools in education and academic research: the case of ChatGPT. Int J Manag Educ 21(2):100822
Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Foltýnek T, Guerrero-Dib J, Popoola O, Waddington L (2023) Testing of detection tools for AI-generated text. Int J Educ Integr 19(1):26
Zellers R, Holtzman A, Rashkin H, Bisk Y, Farhadi A, Roesner F, Choi Y (2019) Defending against neural fake news. Advances in neural information processing systems, 32
Maronikolakis A, Schutze H, Stevenson M (2020) Identifying automatically generated headlines using transformers. arXiv preprint arXiv:2009.13375
Huang Y, Sun L (2024) FakeGPT: fake news generation, explanation and detection of large language models. arXiv preprint
Heppell F, Bakir ME, Bontcheva K (2024) Lying Blindly: bypassing ChatGPT’s safeguards to generate hard-to-detect disinformation claims at scale. arXiv preprint arXiv:2402.08467
Xu H, Ren J, He P, Zeng S, Cui Y, Liu A, Tang J (2023) On the generalization of training-based chatgpt detection methods. arXiv preprint arXiv:2310.01307
Wu T, He S, Liu J, Sun S, Liu K, Han QL, Tang Y (2023) A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA J Autom Sinica 10(5):1122–1136
Greene D, Cunningham P (2006, June) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on MACHINE Learning (pp. 377–384)
Anthropic (2024, March 4) Introducing the next generation of Claude. url: https://www.anthropic.com/news/claude-3-family, retrieved Dec 10, 2024
Zeng G, Yang W, Ju Z, Yang Y, Wang S, Zhang R, Xie P (2020, November) MedDialog: Large-scale medical dialogue datasets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 9241–9250)
Yazdani SF, Murad MAA, Sharef NM, Singh YP, Latiff ARA (2017) Sentiment classification of financial news using statistical features. Int J Pattern Recognit Artif Intell 31(03):1750006
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30
Minaee S, Mikolov T, Nikzad N, Chenaghlu M, Socher R, Amatriain X, Gao J (2024) Large language models: A survey. arXiv preprint arXiv:2402.06196
Kalyan KS (2023) A survey of GPT-3 family large language models including ChatGPT and GPT-4. Natural Language Processing Journal, 100048
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health 2(2):e0000198
Rudolph J, Tan S, Tan S (2023) ChatGPT: bullshit spewer or the end of traditional assessments in higher education? J Appl Learn Teach 6(1):342–363
Cao Y, Zhai J (2023) Bridging the gap–the impact of ChatGPT on financial research. J Chin Econ Bus Stud 21(2):177–191
Guo C, Lu Y, Dou Y, Wang FY (2023) Can ChatGPT boost artistic creation: the need of imaginative intelligence for parallel art. IEEE/CAA J Autom Sinica 10(4):835–838
Zhang T, Patil SG, Jain N, Shen S, Zaharia M, Stoica I, Gonzalez JE (2024) RAFT: Adapting Language Model to Domain Specific RAG. arXiv preprint arXiv:2403.10131
Zhou C, Liu P, Xu P, Iyer S, Sun J, Mao Y, Levy O (2024) Lima: less is more for alignment. Advances in Neural Information Processing Systems, 36
Guu K, Lee K, Tung Z, Pasupat P, Chang M (2020, November) Retrieval augmented language model pre-training. In: International Conference on Machine Learning (pp. 3929–3938). PMLR
Asai A, Wu Z, Wang Y, Sil A, Hajishirzi H (2023) Self-rag: learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 6(35):27730–27744
Zhang H, Chen J, Jiang F, Yu F, Chen Z, Li J, Li H (2023) Huatuogpt, towards taming language model to be a doctor. arXiv preprint arXiv:2305.15075
Hu X, Chen PY, Ho TY (2023) Radar: robust ai-text detection via adversarial learning. Adv Neural Inf Process Syst 36:15077–15095
Verma V, Fleisig E, Tomlin N, Klein D (2023) Ghostbuster: detecting text ghostwritten by large language models. arXiv preprint arXiv:2305.15047
Yang X, Pan L, Zhao X, Chen H, Petzold L, Wang WY, Cheng W (2023) A survey on detection of llms-generated content. arXiv preprint arXiv:2310.15654
Chen Y, Kang H, Zhai V, Li L, Singh R, Ramakrishnan B (2023) Gpt-sentinel: distinguishing human and chatgpt generated content. arXiv preprint arXiv:2305.07969
Wu K, Pang L, Shen H, Cheng X, Chua TS (2023) Llmdet: a large language models detection tool. arXiv preprint arXiv:2305.15004
Wang LZ, Ma Y, Gao R, Guo B, Zhu H, Fan W, Ng KC (2024) Megafake: a theory-driven dataset of fake news generated by large language models. arXiv preprint arXiv:2408.11871
Lavergne T, Urvoy T, Yvon F (2008, July) Detecting fake content with relative entropy scoring. In: Proceedings of the 2008 International Conference on Uncovering Plagiarism, Authorship and Social Software Misuse-Volume 377 (pp. 27–31)
Yang X, Cheng W, Petzold L, Wang, WY, Chen H (2023) Dna-gpt: Divergent n-gram analysis for training-free detection of gpt-generated text. arXiv preprint arXiv:2305.17359
Krishna K, Song Y, Karpinska M, Wieting J, Iyyer M (2024) Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. Advances in Neural Information Processing Systems, 36
Mitchell E, Lee Y, Khazatsky A, Manning CD, Finn C (2023, July) Detectgpt: Zero-shot machine-generated text detection using probability curvature. In: International Conference on Machine Learning (pp. 24950–24962). PMLR
Choudhry A, Khatri I, Jain M, Vishwakarma DK (2022) An emotion-aware multitask approach to fake news and rumor detection using transfer learning. IEEE Trans Comput Soc Syst 11(1):588–599
Cavalcante AAB, Freire PMS, Goldschmidt RR, Justel CM (2024) Early detection of fake news on virtual social networks: a time-aware approach based on crowd signals. Expert Syst Appl 247:123350
Karaoğlan KM (2024) Novel approaches for fake news detection based on attention-based deep multiple-instance learning using contextualized neural language models. Neurocomputing 602:128263
White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, Schmidt DC (2023) A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382
Dey RK, Das AK (2023) Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimed Tools Appl 82(21):32967–32990
Mindner L, Schlippe T, Schaaff K (2023, June) Classification of human-and ai-generated texts: Investigating features for Chatgpt. In: International Conference on Artificial Intelligence in Education Technology (pp. 152–170). Singapore: Springer Nature Singapore
Phani S, Lahiri S, Biswas A (2016, December) Sentiment analysis of tweets in three Indian languages. In: Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016) (pp. 93–102)
Chapman AB, Peterson KS, Alba PR, DuVall SL, Patterson OV (2019) Detecting adverse drug events with rapidly trained classification models. Drug Saf 42:147–156
Corizzo R, Leal-Arenas S (2023, December) One-GPT: a one-class deep fusion model for machine-generated text detection. In: 2023 IEEE International Conference on Big Data (BigData) (pp. 5743–5752). IEEE
Nguyen TT, Hatua A, Sung AH (2023, October) How to detect AI-generated texts?. In: 2023 IEEE 14th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0464–0471). IEEE
Zhou S, Jeong H, Green PA (2017) How consistent are the best-known readability equations in estimating the readability of design standards? IEEE Trans Prof Commun 60(1):97–111
Guo B, Zhang X, Wang Z, Jiang M, Nie J, Ding Y, Wu Y (2023) How close is chatgpt to human experts? Comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597
Corizzo R, Leal-Arenas S (2023) One-class learning for ai-generated essay detection. Appl Sci 13(13):7901
Holtzman A, Buys J, Du L, Forbes M, Choi Y (2019) The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751
Kettunen K (2014) Can type-token ratio be used to show morphological complexity of languages? J Quant Linguist 21(3):223–245
Montemurro MA, Zanette DH (2002) Entropic analysis of the role of words in literary texts. Adv Complex Syst 5(01):7–17
Gargiulo F, Silvestri S, Ciampi M, De Pietro G (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138
Bhattacharjee A, Liu H (2024) Fighting fire with fire: can ChatGPT detect AI-generated text? ACM SIGKDD Explor Newsl 25(2):14–21
Wang R, Chen H, Zhou R, Ma H, Duan Y, Kang Y, Tan T (2024) LLM-detector: improving AI-generated chinese text detection with open-source LLM instruction tuning. arXiv preprint arXiv:2402.01158
Steponenaite A, Barakat, B (2023, July) Plagiarism in AI empowered world. In: International Conference on Human-Computer Interaction (pp. 434–442). Cham: Springer Nature Switzerland
Funding
This work was partially supported by the National Science and Technology Council (NSTC), Taiwan, under Grants Number 112–2622-E-029 -009.
Author information
Authors and Affiliations
Contributions
CS Lin contributed to concept development, methodology, investigation, data collection, experiment design and writing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, CS. A hybrid model for the detection of multi-agent written news articles based on linguistic features and BERT. J Supercomput 81, 381 (2025). https://doi.org/10.1007/s11227-024-06882-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06882-4