Abstract
The advent of Large Language Models (LLMs) has led to a surge in Natural Language Generation (NLG), aiding humans in composing text for various tasks. However, there is a risk of these models being misused. For instance, detecting artificially generated text from original text is a concern in academia. Current research works on detection do not attempt to replicate how humans would use these models. In our work, we address this issue by leveraging data generated by mimicking how humans would use LLMs in composing academic works. Our study examines the detectability of the generated text using DetectGPT and GLTR, and we utilize state-of-the-art classification models like SciBERT, RoBERTa, DEBERTa, XLNet, and ELECTRA. Our experiments show that the generated text is difficult to detect using existing models when created using a LLM fine-tuned on the remainder of a paper. This highlights the importance of using realistic and challenging datasets in future research aimed at detecting artificially generated text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhat, A.: GPT-wiki-intro (revision 0e458f5). https://huggingface.co/datasets/ aadityaubhat/GPT-wiki-intro
Gehrmann, S., Strobelt, H., Rush, A.M.: GLTR: statistical detection and visualization of generated text. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 111–116 (2019)
Glazkova, A., Glazkov, M.: Detecting generated scientific papers using an ensemble of transformer models. In: Proceedings of the Third Workshop on Scholarly Document Processing, pp. 223–228 (2022)
Kashnitsky, Y., Herrmannova, D., de Waard, A., Tsatsaronis, G., Fennell, C., Labbé, C.: Overview of the DAGPap22 shared task on detecting automatically generated scientific papers. In: Third Workshop on Scholarly Document Processing (2022)
Liyanage, V., Buscaldi, D., Nazarenko, A.: A benchmark corpus for the detection of automatically generated text in academic publications. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4692–4700 (2022)
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., Finn, C.: DetectGPT: zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305 (2023)
Rodriguez, J., Hay, T., Gros, D., Shamsi, Z., Srinivasan, R.: Cross-domain detection of GPT-2-generated technical text. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1213–1233 (2022)
Rosati, D.: SynSciPass: detecting appropriate uses of scientific text generation. In: Proceedings of the Third Workshop on Scholarly Document Processing, pp. 214–222 (2022)
Zellers, R., et al.: Defending against neural fake news. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liyanage, V., Buscaldi, D. (2023). Detecting Artificially Generated Academic Text: The Importance of Mimicking Human Utilization of Large Language Models. In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds) Natural Language Processing and Information Systems. NLDB 2023. Lecture Notes in Computer Science, vol 13913. Springer, Cham. https://doi.org/10.1007/978-3-031-35320-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-35320-8_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35319-2
Online ISBN: 978-3-031-35320-8
eBook Packages: Computer ScienceComputer Science (R0)