Abstract
Text authored by an unidentified assailant can offer valuable clues to the assailant’s identity. In this paper, we show that stylistic text features can be exploited to determine an anonymous author’s native language with high accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Koppel, M., Argamon, S., Shimony, A.: Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4) (2002)
Lado, R.: Linguistics Across Cultures. University of Michigan Press, Ann Arbor (1961)
Corder, S.P.: Error Analysis and Interlanguage. Oxford University Press, Oxford (1981)
Tomokiyo, L.M., Jones, R.: You’re Not From ’Round Here, Are You? Naive Bayes Detection of Non-native Utterance Text. In: NAACL 2001 (2001)
Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison Wesley, Reading (1964)
Yule, G.U.: On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship. Biometrika 30, 363–390 (1938)
Baayen, H., van Halteren, H., Tweedie, F.: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, vol. 11 (1996)
Argamon-Engelson, S., Koppel, M., Avneri, G.: Style-based text categorization: What newspaper am I reading? In: Proc. of AAAI Workshop on Learning for Text Categorization, pp. 1–4 (1998)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-based authorship attribution without lexical measures. Computers and the Humanities 35, 193–214 (2001)
Koppel, M., Schler, J.: Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In: Proceedings of IJCAI 2003 Workshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico (2003)
Peng, F., Schuurmans, D., Wang, S.: Augmenting Naive Bayes Classifiers with Statistical Language Models. Inf. Retr. 7(3-4), 317–345 (2004)
Foster, D.: Author Unknown: On the Trail of Anonymous. Henry Holt, New York (2000)
Dagneaux, E., Denness, S., Granger, S.: Computer-aided Error Analysis System. An International Journal of Educational Technology and Applied Linguistics 26(2), 163–174 (1998)
Tono, Y., Kaneko, T., Isahara, H., Saiga, T., Izumi, E.: The Standard Speaking Test (SST) Corpus: A 1 million-word spoken corpus of Japanese learners of English and its implications for L2 lexicography. In: Second Asialex International Congress, Korea, pp. 257–262 (2001)
Chodorow, M., Leacock, C.: An unsupervised method for detecting grammatical errors. In: Proceedings of 1st Meeting of N. American Chapter of Assoc. for Computational Linguistics, pp. 140–147 (2000)
Francis, W., Kucera, H.: Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin Company, Boston (1982)
Brill, E.: A simple rule-based part-of-speech tagger. In: Proceedings of 3rd Conference on Applied Natural Language Processing, pp. 152–155 (1992)
Granger, S., Dagneaux, E., Meunier, F.: The International Corpus of Learner English. Handbook and CD-ROM. Presses Universitaires de Louvain, Louvain-la-Neuve (2002)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, pp. 137–142 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koppel, M., Schler, J., Zigdon, K. (2005). Automatically Determining an Anonymous Author’s Native Language. In: Kantor, P., et al. Intelligence and Security Informatics. ISI 2005. Lecture Notes in Computer Science, vol 3495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11427995_17
Download citation
DOI: https://doi.org/10.1007/11427995_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25999-2
Online ISBN: 978-3-540-32063-0
eBook Packages: Computer ScienceComputer Science (R0)