Abstract
Mining textual data in chat mediums is becoming more important because these mediums contain a vast amount of information, which is potentially relevant to a society’s current interests, habits, social behaviors, crime tendency and other tendencies. Here, sex identification is taken as a base study in information mining in chat mediums. In order to do this, a simple discrimination function and semantic analysis method are proposed for sex identification in Turkish chat mediums. Then, the proposed sex identification method is compared with the Support Vector Machine (SVM) and Naive Bayes (NB) methods. Finally, results show that the proposed system has achieved accuracy over 90% in sex identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Khan, F.M., Fisher, T.A., Shuler, L.A., Tianhao, W., Pottenger, W.M.: Mining Chat-room Conversations for Social and Semantic Interactions. Lehigh University Technical Report LU-CSE-02-011 (2002)
Elnahrawy, E.: Log-Based Chat Room Monitoring Using Text Categorization: A Comparative Study. In: The International Conference on Information and Knowledge Sharing, US Virgin Islands (2002)
Kose, C., Özyurt, O., Amanmyradov, G.: Mining Chat Conversations for Sex Identification. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 106–117. Springer, Heidelberg (2007)
Kose, C., Ozyurt, O.: A Target Oriented Agent to Collect Specific Information in a Chat Medium. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS, vol. 4263, pp. 697–706. Springer, Heidelberg (2006)
Kose, C., Nabiyev, V., Özyurt, O.: A statistical approach for sex identification in chat mediums. In: The international scientific conference on Problems of Cybernetic and Informatics (PCI) (October 2006)
Bengel, J., Gauch, S., Mittur, E., Vijayaraghavan, R.: Chattrack: chat room topic detection using classification. In: Chen, H., Moore, R., Zeng, D.D., Leavitt, J. (eds.) ISI 2004. LNCS, vol. 3073, pp. 266–277. Springer, Heidelberg (2004)
Haichao, D., Siu, C.H., Yulan, H.: Structural analysis of chat messages for topic detection. Online Information Review 30(5), 496–516 (2006)
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender. Oxford Journals, Humanities, Literary and Lingustic Computing 17, 401–412 (2003)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Köse, C., Özyurt, Ö., İkibaş, C. (2008). A Comparison of Textual Data Mining Methods for Sex Identification in Chat Conversations. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_76
Download citation
DOI: https://doi.org/10.1007/978-3-540-68636-1_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68633-0
Online ISBN: 978-3-540-68636-1
eBook Packages: Computer ScienceComputer Science (R0)