Abstract
Chat mediums are becoming an important part of human life in societies and provide quite useful information about people such as their current interests, habits, social behaviors and tendencies. In this study, we have presented an identification system to identify the sex of a person in a Turkish chat medium. Here, the sex identification is taken as a base study in the information mining in chat mediums. This system acquires data from a chat medium, and then automatically detects the chatter’s sex from the information exchanged between chatters and compares them with the known identities of the chatters. To do this task, a simple discrimination function is used to determine the sex of the chatters. A semantic analysis method is also proposed to enhance the performance of the system. The system with the semantic analyzer has achieved accuracy over 90% in the sex identification in the real chat medium.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Khan, F.M., Fisher, T.A., Shuler, L.A., Tianhao, W., Pottenger, W.M.: Mining Chat-room Conversations for Social and Semantic Interactions. Lehigh University Technical Report LU-CSE-02-011 (2002)
Elnahrawy, E.: Log-Based Chat Room Monitoring Using Text Categorization: A Comparative Study. In: The International Conference on Information and Knowledge Sharing, US Virgin Islands (2002)
Baumgartner, R., Eiter, T., Gottlob, G., Herzog, M., Koch, C.: Information extraction for the semantic. In: Eisinger, N., Małuszyński, J. (eds.) Reasoning Web. LNCS, vol. 3564, pp. 275–289. Springer, Heidelberg (2005)
Haichao, D., Siu, C.H., Yulan, H.: Structural analysis of chat messages for topic detection, Online Information Review. 30(5), 496–516 (2006)
Ville, H.T., Henry, T.: Combining Topic Models and Social Networks for Chat Data Mining. In: International Conference on Web Intelligence (WI 2004), Beijing, China, pp. 206–213 (2004)
Harksoo, K., Choong-Nyoung, S., Jungyun, S.: A dialogue-based information retrieval assistant using shallow NLP techniques in online domains. IEICE Trans. Inf. & Syst. 5, 801–808 (2005)
Tianhao, W., Khan, F.M., Fisher, T.A., Shuler, L.A., Pottenger, W.M.: Error-Driven Boolean-Logic-Rule-Based Learning for Mining Chat-room Conversations. Lehigh University Technical Report LU-CSE-02-008 (2002)
Kose, C., Nabiyev, V., Özyurt, O.: A statistical approach for sex identification in chat mediums, The international scientific conference on Problems of Cybernetic and Informatics (PCI), 17–20 (2006)
Ozyurt, O., Kose, C.: Information extraction in the chat mediums: statistical and semantic approaches for sex identification, ELECO 2006, Electrical- Electronics-Computer Engineering Workshop (2006)
Tianhao, W., Pottenger, W.M.: A Semi-supervised Algorithm for Pattern Discovery in Information Extraction from Textual Data. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 117–123. Springer, Heidelberg (2003)
Bing, L., Xiaoli, L., Wee, S.L., Philip, S.Y.: Text Classification by Labeling Words. Nineteenth National Conference on Artificial Intelligence, 425–430 (2004)
Hengirmen, M.: Türkçe Dilbilgisi. Engin Yayınevi, Ankara (2002)
Oflazer, K.: Two-level Description of Turkish Morphology. Literary and Linguistic Computing 9, 137–148 (1994)
Eryigit, G., Adali, E.: An Affıx Stripping Morphological Analyzer For Turkish. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria, pp. 299–304 (2004)
Külekci Oğuzhan, M., Özkan, M.: Turkish word segmentation by using morphological analyzer. In: Proceedings of Eurospeech, pp. 1053–1056 (2001)
Kanagaluru, C.S., Janaki, R.D.: The dynamics of language understanding. Language Engineering Conference, Hyderabad, India, 197–199 (2002)
Gao, X., Zhang, M.: Learning knowledge bases for information extraction from multiple text based Web sites. In: IEEE/WIC International Conference on Intelligent Agent Technology, pp. 119–125 (2003)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
George, H.J., Pat, L.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Köse, C., Özyurt, Ö., Amanmyradov, G. (2007). Mining Chat Conversations for Sex Identification. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)