Automatically Determining an Anonymous Author’s Native Language

Koppel, Moshe; Schler, Jonathan; Zigdon, Kfir

doi:10.1007/11427995_17

Moshe Koppel²³,
Jonathan Schler²³ &
Kfir Zigdon²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3495))

Included in the following conference series:

International Conference on Intelligence and Security Informatics

4184 Accesses
13 Citations

Abstract

Text authored by an unidentified assailant can offer valuable clues to the assailant’s identity. In this paper, we show that stylistic text features can be exploited to determine an anonymous author’s native language with high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Koppel, M., Argamon, S., Shimony, A.: Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4) (2002)
Google Scholar
Lado, R.: Linguistics Across Cultures. University of Michigan Press, Ann Arbor (1961)
Google Scholar
Corder, S.P.: Error Analysis and Interlanguage. Oxford University Press, Oxford (1981)
Google Scholar
Tomokiyo, L.M., Jones, R.: You’re Not From ’Round Here, Are You? Naive Bayes Detection of Non-native Utterance Text. In: NAACL 2001 (2001)
Google Scholar
Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison Wesley, Reading (1964)
MATH Google Scholar
Yule, G.U.: On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship. Biometrika 30, 363–390 (1938)
Google Scholar
Baayen, H., van Halteren, H., Tweedie, F.: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, vol. 11 (1996)
Google Scholar
Argamon-Engelson, S., Koppel, M., Avneri, G.: Style-based text categorization: What newspaper am I reading? In: Proc. of AAAI Workshop on Learning for Text Categorization, pp. 1–4 (1998)
Google Scholar
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-based authorship attribution without lexical measures. Computers and the Humanities 35, 193–214 (2001)
Article Google Scholar
Koppel, M., Schler, J.: Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In: Proceedings of IJCAI 2003 Workshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico (2003)
Google Scholar
Peng, F., Schuurmans, D., Wang, S.: Augmenting Naive Bayes Classifiers with Statistical Language Models. Inf. Retr. 7(3-4), 317–345 (2004)
Article Google Scholar
Foster, D.: Author Unknown: On the Trail of Anonymous. Henry Holt, New York (2000)
Google Scholar
Dagneaux, E., Denness, S., Granger, S.: Computer-aided Error Analysis System. An International Journal of Educational Technology and Applied Linguistics 26(2), 163–174 (1998)
Google Scholar
Tono, Y., Kaneko, T., Isahara, H., Saiga, T., Izumi, E.: The Standard Speaking Test (SST) Corpus: A 1 million-word spoken corpus of Japanese learners of English and its implications for L2 lexicography. In: Second Asialex International Congress, Korea, pp. 257–262 (2001)
Google Scholar
Chodorow, M., Leacock, C.: An unsupervised method for detecting grammatical errors. In: Proceedings of 1st Meeting of N. American Chapter of Assoc. for Computational Linguistics, pp. 140–147 (2000)
Google Scholar
Francis, W., Kucera, H.: Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin Company, Boston (1982)
Google Scholar
Brill, E.: A simple rule-based part-of-speech tagger. In: Proceedings of 3rd Conference on Applied Natural Language Processing, pp. 152–155 (1992)
Google Scholar
Granger, S., Dagneaux, E., Meunier, F.: The International Corpus of Learner English. Handbook and CD-ROM. Presses Universitaires de Louvain, Louvain-la-Neuve (2002)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bar-Ilan University, 51900, Ramat-Gan, Israel
Moshe Koppel, Jonathan Schler & Kfir Zigdon

Authors

Moshe Koppel
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Schler
View author publications
You can also search for this author in PubMed Google Scholar
Kfir Zigdon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Library and Information Science, Rutgers University,
Paul Kantor
School of Communication, Information and Library Studies, Rutgers University, 4 Huntington Street, 08901-1071, New Brunswick, NJ, USA
Gheorghe Muresan
Artificial Solutions, Altonaer Poststraße 13b, 22767, Hamburg, Germany
Fred Roberts
MIS Department, University of Arizona, 85721, Tucson, AZ, USA
Daniel D. Zeng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Fei-Yue Wang
Department of Management Information Systems, Eller College of Management, The University of Arizona, 85721, AZ, USA
Hsinchun Chen
College of Computing, Georgia Tech Information Security Center, Georgia Institute of Technology, 801 Atlantic Drive, 30332-0280, Atlanta, GA, USA
Ralph C. Merkle

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koppel, M., Schler, J., Zigdon, K. (2005). Automatically Determining an Anonymous Author’s Native Language. In: Kantor, P., et al. Intelligence and Security Informatics. ISI 2005. Lecture Notes in Computer Science, vol 3495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11427995_17

Download citation

DOI: https://doi.org/10.1007/11427995_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25999-2
Online ISBN: 978-3-540-32063-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics