Abstract
User profession plays an important role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this paper, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges which make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework of PRofession Identification in Social Media (PRISM). It takes advantages of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidences of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate significant effectiveness of our method compared with other baseline methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In this paper, we use the Java version of Liblinear, developed by Benedikt Waldvogel, which can be accessed via http://www.bwaldvogel.de/liblinear-java/.
- 2.
We select LibSVM [3] as the implementation of SVM, which can be accessed via http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
- 3.
References
Antoniades, D., Polakis, I., Kontaxis, G., Athanasopoulos, E., Ioannidis, S., Markatos, E.P., Karagiannis, T.: we.b: the web of short URLs. In: Proceedings of WWW, pp. 715–724 (2011)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of EMNLP, pp. 1301–1309 (2011)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM TIST 2(3), 27 (2011)
Chaudhari, G., Avadhanula, V., Sarawagi, S.: A few good predictions: selective node labeling in a social network. In: Proceedings of WSDM, pp. 353–362 (2014)
Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., Kleinberg, J.: Echoes of power: language effects and power differences in social interaction. In: Proceedings of WWW, pp. 699–708 (2012)
Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: hedonometrics and twitter. PLoS ONE 6(12), e26752 (2011)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. JMLR 9, 1871–1874 (2008)
Feng, W., Wang, J.: Incorporating heterogeneous information for personalized tag recommendation in social tagging systems. In: Proceedings of KDD, pp. 1276–1284 (2012)
Fink, C., Kopecky, J., Morawski, M.: Inferring gender from the content of tweets: a region specific example. In: Proceedings of ICWSM (2012)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. JMLR 3, 1289–1305 (2003)
Golbeck, J., Robles, C., Turner, K.: Predicting personality with social media. In: Proceedings of CHI, pp. 253–262 (2011)
Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Proceedings of ICWSM (2009)
Jacob, Y., Denoyer, L., Gallinari, P.: Learning latent representations of nodes for classifying in heterogeneous social networks. In: Proceedings WSDM, pp. 373–382 (2014)
Kong, X., Cao, B., Yu, P.S.: Multi-label classification by mining label and instance correlations from heterogeneous information networks. In: Proceedings of KDD, pp. 614–622 (2013)
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of KDD, pp. 1023–1031 (2012)
Liu, Z., Tu, C., Sun, M.: Tag dispatch model with social network regularization for microblog user tag suggestion. In: Proceedings of COLING (2012)
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27, 415–444 (2001)
Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Understanding the demographics of twitter users. In: Proceedings of ICWSM (2011)
Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of WSDM, pp. 251–260 (2010)
Newman, M.E.: Modularity and community structure in networks. PNAS 103(23), 8577–8582 (2006)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of Workshop on Search and Mining User-Generated Contents, pp. 37–44 (2010)
Sachan, M., Dubey, A., Srivastava, S., Xing, E.P., Hovy, E.: Spatial compactness meets topical consistency: jointly modeling links and content for community detection. In: Proceedings of WSDM, pp. 503–512 (2014)
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)
Volti, R.: An Introduction to the Sociology of Work and Occupations. Pine Forge Press, Thousand Oaks (2011)
Yang, S.H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., Zha, H.: Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of WWW, pp. 537–546 (2011)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. Proc. ICML 97, 412–420 (1997)
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Acknowledgement
This work is supported by the National Natural Science Foundation of China under Grant Nos. 61170196 and 61202140 and the Major Project of the National Social Science Foundation of China under Grant No. 13&ZD190.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Singapore
About this paper
Cite this paper
Tu, C., Liu, Z., Sun, M. (2015). PRISM: Profession Identification in Social Media with Personal Information and Community Structure. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-0080-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0079-9
Online ISBN: 978-981-10-0080-5
eBook Packages: Computer ScienceComputer Science (R0)