Abstract
Computer-generated text or artificial text nowadays is in abundance on the web, ranging from basic random word salads to web scraping. In this paper, we present a short version of systematic review of some existing automated methods aimed at distinguishing natural texts from artificially generated ones. The methods were chosen by certain criteria. We further provide a summary of the methods considered. Comparisons, whenever possible, use common evaluation measures, and control for differences in experimental set-up.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Grechnikov, E.A., Gusev, G.G., Kustarev, A.A., Raigorodsky, A.M.: Detection of artificial texts, digital libraries: advanced methods and technologies, digital collections. In: Proceedings of XI All-Russian Research Conference RCDL 2009, KRC RAS, Petrozavodsk, pp. 306–308 (2009)
Corston-Oliver, S., Gamon, M., Brockett, C.: A machine learning approach to the automatic evaluation of machine translation. In: Proceeding of 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 148–155 (2001)
Urvoy, T., Lavergne, T., Filoche, P.: Tracking web spam with hidden style similarity. In: AIRWEB 2006, Seattle, Washington, USA, 10 August 2006
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, Burlington (2011)
Arase, Y., Zhou, M.: Machine translation detection from monolingual web-text. In: Proceedings of 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 1597–1607, 4–9 August 2013
Baayen, R.H.: Word Frequency Distributions. Kluwer Academic Publishers, Amsterdam (2001)
Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU-Cambridge toolkit. In: Proceedings of Eurospeech 1997, pp. 2707–2710 (1997)
Chickering, D.M., Heckerman, D., Meek, C.: A Bayesian approach to learning Bayesian networks with local structure. In: Geiger, D., Shenoy, P.P. (eds.) Proceedings of 13th Conference on Uncertainty in Artificial Intelligence, pp. 80–89 (1997)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Chen, S.F., Goodman, J.T.: An empirical study of smoothing techniques for language modeling. In: Proceedings of 34th Annual Meeting of the Association for Computational Linguistics (ACL), Santa Cruz, pp. 310–318 (1996)
Honore, A.: Some simple measures of richness of vocabulary. Assoc. Lit. Linguist. Comput. Bull. 7(2), 172–177 (1979)
Sichel, H.: On a distribution law for word frequencies. J. Am. Stat. Assoc. 70, 542–547 (1975)
Lavergne, T., Urvoy, T., Yvon, F.: Detecting fake content with relative entropy scoring. In: PAN 2008 (2008)
Seymore, K., Rosenfeld, R.: Scalable backoff language models. In: ICSLP 1996, Philadelphia, PA, vol. 1, pp. 232–235 (1996)
Stolcke, A.: Entropy-based pruning of backoff language models (1998)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2005) (2005)
Heymann, P., Koutrika, G., Garcia-Molina, H.: Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Mag. Internet Comput. 11(6), 36–45 (2007)
Labbé, C., Labbé, D.: Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientometrics, Akadémiai Kiadó, p. 10 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Beresneva, D. (2016). Computer-Generated Text Detection Using Machine Learning: A Systematic Review. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-41754-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41753-0
Online ISBN: 978-3-319-41754-7
eBook Packages: Computer ScienceComputer Science (R0)