Skip to main content

Mining Chat Conversations for Sex Identification

  • Conference paper
Emerging Technologies in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

Abstract

Chat mediums are becoming an important part of human life in societies and provide quite useful information about people such as their current interests, habits, social behaviors and tendencies. In this study, we have presented an identification system to identify the sex of a person in a Turkish chat medium. Here, the sex identification is taken as a base study in the information mining in chat mediums. This system acquires data from a chat medium, and then automatically detects the chatter’s sex from the information exchanged between chatters and compares them with the known identities of the chatters. To do this task, a simple discrimination function is used to determine the sex of the chatters. A semantic analysis method is also proposed to enhance the performance of the system. The system with the semantic analyzer has achieved accuracy over 90% in the sex identification in the real chat medium.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Khan, F.M., Fisher, T.A., Shuler, L.A., Tianhao, W., Pottenger, W.M.: Mining Chat-room Conversations for Social and Semantic Interactions. Lehigh University Technical Report LU-CSE-02-011 (2002)

    Google Scholar 

  2. Elnahrawy, E.: Log-Based Chat Room Monitoring Using Text Categorization: A Comparative Study. In: The International Conference on Information and Knowledge Sharing, US Virgin Islands (2002)

    Google Scholar 

  3. Baumgartner, R., Eiter, T., Gottlob, G., Herzog, M., Koch, C.: Information extraction for the semantic. In: Eisinger, N., Małuszyński, J. (eds.) Reasoning Web. LNCS, vol. 3564, pp. 275–289. Springer, Heidelberg (2005)

    Google Scholar 

  4. Haichao, D., Siu, C.H., Yulan, H.: Structural analysis of chat messages for topic detection, Online Information Review.  30(5), 496–516 (2006)

    Google Scholar 

  5. Ville, H.T., Henry, T.: Combining Topic Models and Social Networks for Chat Data Mining. In: International Conference on Web Intelligence (WI 2004), Beijing, China, pp. 206–213 (2004)

    Google Scholar 

  6. Harksoo, K., Choong-Nyoung, S., Jungyun, S.: A dialogue-based information retrieval assistant using shallow NLP techniques in online domains. IEICE Trans. Inf. & Syst. 5, 801–808 (2005)

    Google Scholar 

  7. Tianhao, W., Khan, F.M., Fisher, T.A., Shuler, L.A., Pottenger, W.M.: Error-Driven Boolean-Logic-Rule-Based Learning for Mining Chat-room Conversations. Lehigh University Technical Report LU-CSE-02-008 (2002)

    Google Scholar 

  8. Kose, C., Nabiyev, V., Özyurt, O.: A statistical approach for sex identification in chat mediums, The international scientific conference on Problems of Cybernetic and Informatics (PCI), 17–20 (2006)

    Google Scholar 

  9. Ozyurt, O., Kose, C.: Information extraction in the chat mediums: statistical and semantic approaches for sex identification, ELECO 2006, Electrical- Electronics-Computer Engineering Workshop (2006)

    Google Scholar 

  10. Tianhao, W., Pottenger, W.M.: A Semi-supervised Algorithm for Pattern Discovery in Information Extraction from Textual Data. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 117–123. Springer, Heidelberg (2003)

    Google Scholar 

  11. Bing, L., Xiaoli, L., Wee, S.L., Philip, S.Y.: Text Classification by Labeling Words. Nineteenth National Conference on Artificial Intelligence, 425–430 (2004)

    Google Scholar 

  12. Hengirmen, M.: Türkçe Dilbilgisi. Engin Yayınevi, Ankara (2002)

    Google Scholar 

  13. Oflazer, K.: Two-level Description of Turkish Morphology. Literary and Linguistic Computing 9, 137–148 (1994)

    Article  Google Scholar 

  14. Eryigit, G., Adali, E.: An Affıx Stripping Morphological Analyzer For Turkish. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria, pp. 299–304 (2004)

    Google Scholar 

  15. Külekci Oğuzhan, M., Özkan, M.: Turkish word segmentation by using morphological analyzer. In: Proceedings of Eurospeech, pp. 1053–1056 (2001)

    Google Scholar 

  16. Kanagaluru, C.S., Janaki, R.D.: The dynamics of language understanding. Language Engineering Conference, Hyderabad, India, 197–199 (2002)

    Google Scholar 

  17. Gao, X., Zhang, M.: Learning knowledge bases for information extraction from multiple text based Web sites. In: IEEE/WIC International Conference on Intelligent Agent Technology, pp. 119–125 (2003)

    Google Scholar 

  18. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  19. George, H.J., Pat, L.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Köse, C., Özyurt, Ö., Amanmyradov, G. (2007). Mining Chat Conversations for Sex Identification. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77018-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77016-9

  • Online ISBN: 978-3-540-77018-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics