Abstract
This paper presents a method to classify social media users based on their socioeconomic status. Our experiments are conducted on a curated set of Twitter profiles, where each user is represented by the posted text, topics of discussion, interactive behaviour and estimated impact on the microblogging platform. Initially, we formulate a 3-way classification task, where users are classified as having an upper, middle or lower socioeconomic status. A nonlinear, generative learning approach using a composite Gaussian Process kernel provides significantly better classification accuracy (\(75\,\%\)) than a competitive linear alternative. By turning this task into a binary classification – upper vs. medium and lower class – the proposed classifier reaches an accuracy of \(82\,\%\).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Inferred from the location name provided in the user profile description.
- 2.
The data set is available at http://dx.doi.org/10.6084/m9.figshare.1619703.
References
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)
Burger, D.J., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: EMNLP, pp. 1301–1309 (2011)
Cowan, C.D., et al.: Improving the measurement of socioeconomic status for the national assessment of educational progress: a theoretical foundation. Technical report, National Center for Education Statistics (2003)
Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: SMA, pp. 115–122 (2010)
Elias, P., Birch, M.: SOC2010: revision of the standard occupational classification. Econ. Labour Mark. Rev. 4(7), 48–55 (2010)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Lampos, V., Aletras, N., Preoţiuc-Pietro, D., Cohn, T.: Predicting and characterising user impact on Twitter. In: EACL, pp. 405–413 (2014)
Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web. In: CIP, pp. 411–416 (2010)
Lampos, V., Miller, A.C., Crossan, S., Stefansen, C.: Advances in nowcasting influenza-like illness rates using search query logs. Sci. Rep. 5, 12760 (2015)
Lampos, V., Preoţiuc-Pietro, D., Cohn, T.: A user-centric model of voting intention from social media. In: ACL, pp. 993–1003 (2013)
Lampos, V., Yom-Tov, E., Pebody, R., Cox, I.: Assessing the impact of a health intervention via user-generated Internet content. Data Min. Knowl. Disc. 29(5), 1434–1457 (2015)
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Preoţiuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PLoS ONE 10(9), e0138717 (2015)
Preoţiuc-Pietro, D., Lampos, V., Aletras, N.: An analysis of the user occupational class through Twitter content. In: ACL, pp. 1754–1764 (2015)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: SMUC, pp. 37–44 (2010)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Rose, D., Pevalin, D.: Re-basing the NS-SEC on SOC2010: a report to ONS. Techincal report, University of Essex (2010)
Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. 20(12), 1342–1351 (1998)
Acknowledgements
This work has been supported by the EPSRC grant EP/K031953/1 (“Early-Warning Sensing Systems for Infectious Diseases”).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lampos, V., Aletras, N., Geyti, J.K., Zou, B., Cox, I.J. (2016). Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_54
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)