Abstract
In this paper we study on recognizing user’s identity based on instant messages. Considering the special characteristics of chatting text, we mainly focus on three problems, one is how to extract the features of chatting text, the second is how the user’s model is affected by the size of training data, and the third is which classification model is fit for this problem. The chatting corpus used in this paper is collected from a Chinese IM tool and different feature selection methods and classification models are evaluate on it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pavelec, D., Oliveira, L.S., Justino, E.J.R.: Using Conjunctions and Adverbs for Author Verification. Journal of UCS 14, 2967–2981 (2008)
Van Halteren, H.: Author Verification by Linguistic Profiling: An Exploration of the Parameter Space. ACM Transactions on Speech and Language Processing 4, 1–17 (2007)
Luyckx, K., Daelemans, W.: Authorship Attribution and Verification with Many Authors and Limited Data. In: 22nd International Conference on Computational Linguistics, pp. 513–520. ACL Press, Stroudsburg (2008)
Koppel, M., Schler, J.: Authorship Verification as a One-class Classification Problem. In: 21st International Conference on Machine Learning, pp. 1–7. ACM Press, New York (2004)
Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-group Web Forum Messages. IEEE Intelligent Systems 20, 67–75 (2005)
Yule, G.U.: On Sentence-length as a Statistical Characteristic of Style in Prose: with Application to Two Cases of Disputed Authorship. Biometrika 30, 363–390 (1939)
Stamatatos, E.: Ensemble-based Author Identification Using Character N-grams. In: 3rd International Workshop on Text-based Information Retrieval, pp. 41–46. Springer, Heidelberg (2006)
Sanderson, C., Guenter, S.: Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An investigation. In: 2006 International Conference on Empirical Methods in Natural Language Engineering, pp. 482–491. ACL Press, Stroudsburg (2006)
De Vel, O., Anderson, A., Corney, M.: Mining Email Content for Author Identification Forensics. SIGMOD Record 30, 55–64 (2001)
Forsyth, R.S., Holmes, D.I.: Feature Finding for Text Classification. Literary and Linguistic Computing 11, 163–174 (1996)
Zheng, R., Li, J., Chen, H., Huang, Z.A.: Framework for Authorship Identification of Online Messages: Writing Style Features and Classification Techniques. American Society of Information Science and Technology 57, 378–393 (2006)
Grieve, J.: Quantitative Authorship Attribution: An Evaluation of Techniques. Literary and Linguistic Computing 22, 251–270 (2007)
Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-Based Author Profiles for Authorship Attribution. In: 2003 Pacific Association for Computational Linguistics, pp. 255–264. Springer Press, Heidelberg (2003)
Kjell, B.: Discrimination of Authorship Using Visualization. Information Processing and Management 30, 141–150 (1994)
Argamon, S., Saric, M., Stein, S.: Style Mining of Electronic Messages for Multiple Authorship Discrimination: First results. In: 2003 ACM SIGKDD, pp. 475–480. ACM Press, New York (2003)
Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic Text Classification Using Functional Lexical Features. Journal of the American Society for Information Science and Technology 58, 802–822 (2007)
Zhao, Y., Zobel, J.: Effective and Scalable Authorship Attribution Using Function Words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)
Koppel, M., Schler, J.: Exploiting Stylistic Idiosyncrasies for Authorship Attribution. In: IJCAI 2003 Workshop on Computational Approaches to Style Analysis and Synthesis, pp. 69–72. AAAI Press, Menlo Park (2003)
Zhao, Y., Zobel, J.: Searching with Style: Authorship Attribution in Classic Literature. In: Thirtieth Australasian Computer Science Conference, pp. 59–68. Australian Computer Society Press, Darlinghurst (2007)
Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, J.: Chat Mining for Gender Prediction. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 274–283. Springer, Heidelberg (2006)
Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F.: Chat Mining: Predicting User and Message Attributes in Computer-mediated Communication. Information Processing &Management 44, 1448–1466 (2008)
Manevitza, L., Yousef, M.: One-class Document Classification via Neural Networks. Neurocomputing 70, 1466–1481 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, Y., Meng, X., Chai, G., Tang, Y. (2011). User Identification for Instant Messages. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24965-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24965-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24964-8
Online ISBN: 978-3-642-24965-5
eBook Packages: Computer ScienceComputer Science (R0)