Skip to main content

Automatic Turkish Text Categorization in Terms of Author, Genre and Gender

  • Conference paper
Book cover Natural Language Processing and Information Systems (NLDB 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3999))

Abstract

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document’s author, classifying documents according to text’s genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Love, H.: Attributing Authorship: An Introduction. Cambridge Univ. Press, Cambridge (2002)

    Book  Google Scholar 

  2. Dale, R., Moisl, H., Somers, H.: Handbook of NLP. Marcel Dekker, New York (2000)

    Google Scholar 

  3. Burrows, J.F.: Not unless you ask nicely: the interpretative nexus between analysis and information. Literary Linguist Comput (7), 91–109 (1992)

    Google Scholar 

  4. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Text Categorization in Terms of Genre and Author. Computational Linguistics, 471–495 (2000)

    Google Scholar 

  5. Fürnkranz, J.: A Study using n-gram Features for Text Categorization. Austrian Research Institute for Artifical Intelligence (1998)

    Google Scholar 

  6. Cavnar, W.B.: Using an n-gram-based Document Representation with a Vector Processing Retrieval Model. In: Proceedings of the Third Text Retrieval Conference(TREC-3) (1994)

    Google Scholar 

  7. Biber, D.: Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge Univ. Press, Cambridge (1995)

    Book  Google Scholar 

  8. Kessler, B., Nunberg, G., Schütze, H.: Automatic Detection of Text Genre. In: Proc. of the 35th Annual Meeting of the Association for Computational Linguistics (ACL/EACL 1997), pp. 32–38 (1997)

    Google Scholar 

  9. Mulac, A., Studley, L.B., Blau, S.: The Gender-linked Language Effect in Primary and Secondary Students impromptu Essays, Sex Roles, 9/10 (1990)

    Google Scholar 

  10. Herring, S.: Two Variants of an Electronic Message Schema. In: Herring, S. (ed.) Computer-Mediated Communication: Linguistic, Social and Cross-Cultural Perspectives, pp. 81–106 (1996)

    Google Scholar 

  11. Palander, C.M.: Male and Female Styles in 17th Century Correspondence. Language Variation and Change 11, 123–141 (1999)

    Google Scholar 

  12. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender Literary and Linguistic Computing 17(4), 401-412 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amasyalı, M.F., Diri, B. (2006). Automatic Turkish Text Categorization in Terms of Author, Genre and Gender. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_22

Download citation

  • DOI: https://doi.org/10.1007/11765448_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34616-6

  • Online ISBN: 978-3-540-34617-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics