Abstract
In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document’s author, classifying documents according to text’s genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Love, H.: Attributing Authorship: An Introduction. Cambridge Univ. Press, Cambridge (2002)
Dale, R., Moisl, H., Somers, H.: Handbook of NLP. Marcel Dekker, New York (2000)
Burrows, J.F.: Not unless you ask nicely: the interpretative nexus between analysis and information. Literary Linguist Comput (7), 91–109 (1992)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Text Categorization in Terms of Genre and Author. Computational Linguistics, 471–495 (2000)
Fürnkranz, J.: A Study using n-gram Features for Text Categorization. Austrian Research Institute for Artifical Intelligence (1998)
Cavnar, W.B.: Using an n-gram-based Document Representation with a Vector Processing Retrieval Model. In: Proceedings of the Third Text Retrieval Conference(TREC-3) (1994)
Biber, D.: Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge Univ. Press, Cambridge (1995)
Kessler, B., Nunberg, G., Schütze, H.: Automatic Detection of Text Genre. In: Proc. of the 35th Annual Meeting of the Association for Computational Linguistics (ACL/EACL 1997), pp. 32–38 (1997)
Mulac, A., Studley, L.B., Blau, S.: The Gender-linked Language Effect in Primary and Secondary Students impromptu Essays, Sex Roles, 9/10 (1990)
Herring, S.: Two Variants of an Electronic Message Schema. In: Herring, S. (ed.) Computer-Mediated Communication: Linguistic, Social and Cross-Cultural Perspectives, pp. 81–106 (1996)
Palander, C.M.: Male and Female Styles in 17th Century Correspondence. Language Variation and Change 11, 123–141 (1999)
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender Literary and Linguistic Computing 17(4), 401-412 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amasyalı, M.F., Diri, B. (2006). Automatic Turkish Text Categorization in Terms of Author, Genre and Gender. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_22
Download citation
DOI: https://doi.org/10.1007/11765448_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34616-6
Online ISBN: 978-3-540-34617-3
eBook Packages: Computer ScienceComputer Science (R0)