Abstract
Short text such as micro-blog messages is becoming increasingly prevalent in China. Due to the sparseness of the features associated with short text, accurately classifying short text and tagging user interest have become important and challenging tasks. Many recent studies have focused on utilizing external data to address the data sparsity issue but fail to leverage the social-correlation which is expected to help improve the accuracy of short text classification. In this paper, we present a new method using a semi-supervised coupled mutual reinforcement framework based on social-correlation to simultaneously classify short text and tag user interest. Specifically, our method requires relatively few labeled examples to initialize the training process. More importantly, experimental results have demonstrated that our method can achieve 100% accuracy in classifying certain categories and significantly improve the accuracy of classifying the other categories. Meanwhile, the experiments show that our model is effective in user interest tagging.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
China Internet Development Statistics Report, 第32次中国互联网络发展状况统计报告, http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130724346275579709.pdf
Long, G., Chen, L., Zhu, X.Q., Zhang, C.Q.: TCSST: Transfer Classification of Short & Sparse Text Using External Data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 764–772. ACM Press, New York (2012)
Pan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-Scale Data Collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM Press, Beijing (2008)
Dai, Z., Sun, A., Liu, X.-Y.: Crest: Cluster-based Representation Enrichment for Short Text Classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part II. LNCS, vol. 7819, pp. 256–267. Springer, Heidelberg (2013)
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short Text Classification in Twitter to Improve Information Filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842. ACM Press, New York (2010)
Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting Text Similarity over Short Passage: Exploring Linguistic Feature Combinations via Machine Learning. In: Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 203–212. Maryland (1999)
Li, Y.H., Mclean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18, 1138–1150 (2006)
Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of ROCLING X, Taiwan (1997)
Lyon, C., Malcolm, J., Dickerson, B.: Detecting Short Passages of Similar Text in Large Document. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 118–128. Pennsylvania (2001)
Rafeeque, P.C., Sendhikumar, S.: A Survey on Short Text Analysis in Web. In: 2011 Third International Conference on Advanced Computing, Chennai, pp. 365–371 (2011)
Meng, W., Lanfen, L., Jing, W., Penghua, Y., Jiaolong, L., Fei, X.: Improving Short Text Classification Using Public Search Engines. In: Qin, Z., Huynh, V.-N. (eds.) IUKM 2013. LNCS, vol. 8032, pp. 157–166. Springer, Heidelberg (2013)
Francisco, P.R., Pascual, J.-I., Andres, S., Mateus, F.S., Juan, G.-C.: Classifying Unlabeled Short Texts Using a Fuzzy Declarative Approach. Language Resources and Evaluation 47, 151–178 (2013)
Sarah, Z., Haym, H.: Improving Short Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity. In: Proceedings of the Seventeenth International Conference on Machine Learning, San Francisco, pp. 1183–1190 (2000)
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM Press, New York (1998)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley (1973)
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196. Pennsylvania (1995)
Bian, J., Liu, Y.D., Zhou, D., Agichtein, E., Zha, H.Y.: Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement. In: Proceedings of the 18th International Conference on World Wide Web, p. 5 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, R., Zhang, Y. (2013). Social-Correlation Based Mutual Reinforcement for Short Text Classification and User Interest Tagging. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-53914-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)