Skip to main content

Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Abstract

This paper presents a method to classify social media users based on their socioeconomic status. Our experiments are conducted on a curated set of Twitter profiles, where each user is represented by the posted text, topics of discussion, interactive behaviour and estimated impact on the microblogging platform. Initially, we formulate a 3-way classification task, where users are classified as having an upper, middle or lower socioeconomic status. A nonlinear, generative learning approach using a composite Gaussian Process kernel provides significantly better classification accuracy (\(75\,\%\)) than a competitive linear alternative. By turning this task into a binary classification – upper vs. medium and lower class – the proposed classifier reaches an accuracy of \(82\,\%\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Inferred from the location name provided in the user profile description.

  2. 2.

    The data set is available at http://dx.doi.org/10.6084/m9.figshare.1619703.

References

  1. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)

    Article  Google Scholar 

  2. Burger, D.J., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: EMNLP, pp. 1301–1309 (2011)

    Google Scholar 

  3. Cowan, C.D., et al.: Improving the measurement of socioeconomic status for the national assessment of educational progress: a theoretical foundation. Technical report, National Center for Education Statistics (2003)

    Google Scholar 

  4. Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: SMA, pp. 115–122 (2010)

    Google Scholar 

  5. Elias, P., Birch, M.: SOC2010: revision of the standard occupational classification. Econ. Labour Mark. Rev. 4(7), 48–55 (2010)

    Article  Google Scholar 

  6. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Article  Google Scholar 

  7. Lampos, V., Aletras, N., Preoţiuc-Pietro, D., Cohn, T.: Predicting and characterising user impact on Twitter. In: EACL, pp. 405–413 (2014)

    Google Scholar 

  8. Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web. In: CIP, pp. 411–416 (2010)

    Google Scholar 

  9. Lampos, V., Miller, A.C., Crossan, S., Stefansen, C.: Advances in nowcasting influenza-like illness rates using search query logs. Sci. Rep. 5, 12760 (2015)

    Article  Google Scholar 

  10. Lampos, V., Preoţiuc-Pietro, D., Cohn, T.: A user-centric model of voting intention from social media. In: ACL, pp. 993–1003 (2013)

    Google Scholar 

  11. Lampos, V., Yom-Tov, E., Pebody, R., Cox, I.: Assessing the impact of a health intervention via user-generated Internet content. Data Min. Knowl. Disc. 29(5), 1434–1457 (2015)

    Article  Google Scholar 

  12. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  13. Preoţiuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PLoS ONE 10(9), e0138717 (2015)

    Article  Google Scholar 

  14. Preoţiuc-Pietro, D., Lampos, V., Aletras, N.: An analysis of the user occupational class through Twitter content. In: ACL, pp. 1754–1764 (2015)

    Google Scholar 

  15. Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: SMUC, pp. 37–44 (2010)

    Google Scholar 

  16. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  17. Rose, D., Pevalin, D.: Re-basing the NS-SEC on SOC2010: a report to ONS. Techincal report, University of Essex (2010)

    Google Scholar 

  18. Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. 20(12), 1342–1351 (1998)

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by the EPSRC grant EP/K031953/1 (“Early-Warning Sensing Systems for Infectious Diseases”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Lampos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lampos, V., Aletras, N., Geyti, J.K., Zou, B., Cox, I.J. (2016). Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_54

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics