Abstract
Text classification is defined simply as the labeling of natural and unstructured language text documents using predefined categories or classes. This classification not only help organizations in improving their business communication skills and their customer satisfaction levels, but also improves the usage of unstructured data in academic and non-academic world. The aim of this study is to analyze the effect of stemming, over-sampling, and stopword-removal when doing automatic classification on Turkish content. After obtaning a Turkish Corpus, stemming, balancing, and stopword-removal is applied and the results are evaluated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Digital Universe Invaded By Sensors, Press Release, EMC 2 (2014). http://www.emc.com/about/news/press/2014/20140409-01.htm
Big Data, for better or worse: 90 % of world,s data generated over last two years, ScienceDaily, 2013. http://www.sciencedaily.com/releases/2013/05/130522085217.htm
Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.: Analysis of preprocessing methods on classification of Turkish texts. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 112–117, İstanbul (2011)
Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., Vursavas, O.M.: Information retrieval on Turkish texts. J. Am. Soc. Inform. Sci. Technol. 59(3), 407–421 (2008)
Güran, A., Akyokuş, S., Bayazıt, N.G., Gürbüz, M.Z.: Turkish text categorization using N-Gram words. In: International Symposium on Innovations in Intelligent Systems and Applications, Trabzon (2009)
Akkuş, B.K., Çakıcı, R.: Categorization of Turkish news documents with morphological analysis. In: Proceedings of the ACL Student Research Workshop, pp. 1–8, Sofia (2013)
Akın, A.A., Akın, M.D.: Zemberek an open source NLP framework for Turkic languages (2007)
Amasyalı, M.F., Diri, B.: Automatic Turkish text categorization in terms of author, genre and gender. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 221–226. Springer, Heidelberg (2006)
Özgür, L., Güngör, T., Gürgen, F.: Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish. Pattern Recogn. Lett. 25(16), 1819–1831 (2004)
Çataltepe, Z., Turan, Y., Kesgin, F.: Turkish document classification using shorter roots. In: IEEE 15th Signal Processing and Communications Applications, Eskişehir (2007)
Çıltık, A., Güngör, T.: Time efficient spam e-mail filtering using n-gram models. Pattern Recogn. Lett. 29(1), 19–33 (2008)
Amasyalı, M.F., Beken, A.: Measurement of Turkish word semantic similarity and text categorization application. In: IEEE 17th Signal Processing and Communications Applications Conference, Antalya (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. Arch. 16(1), 321–357 (2002)
Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS 2003), Track 4, vol. 4, pp. 103.3, Washington (2003)
Burges, C.J.C.: Simplified support vector decision rules. In: 13th International Conference on Machine Learning, p. 71 (1996)
Kwok, J.T.: Automated text categorization using support vector machine. In: Proceedings of the International Conference on Neural Information Processing (ICONIP), pp. 347–351, Kitakyushu (1998)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Çağataylı, M., Çelebi, E. (2015). The Effect of Stemming and Stop-Word-Removal on Automatic Text Classification in Turkish Language. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-26532-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)