Abstract
User age information plays a crucial role in many real applications such as precise marketing, directional promotion and personalized recommendation. In this paper, we focus on predicting user age range in Sina Weibo. To protect user privacy, we only have user basic profile information and user published messages (tweets), which are all mapped to integers. From these meaningless integers, we have to seek out underlying features or structures. Through analysis, we extract significant features related to age. In order to evaluate the correlation between user basic information and age ranges, we choose mutual information as measurement. To handle the problem of high dimensions and data sparsity caused by traditional word vector model of tweet contents, we propose aggregated tweet features corresponding to different age ranges. Using these features, we compared many classification algorithms. Finally, the model based on decision tree can achieve best prediction accuracy up to 83%.
Chapter PDF
References
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying Latent User Attributes in Twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents (2010)
Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in Machine Learning (2009)
Han, J., Pei, J.: Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter, 2(2) (December 2000)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Liu, T., Liu, H., He, J., Du, X. (2013). Predicting Microblog User’s Age Based on Text Information. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-41230-1_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41229-5
Online ISBN: 978-3-642-41230-1
eBook Packages: Computer ScienceComputer Science (R0)