Skip to main content

Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

  • Conference paper
Advances in Information Retrieval (ECIR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

Abstract

Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that many hashtags are not recorded in Wikipedia.

  2. 2.

    http://dev.twitter.com.

  3. 3.

    http://crowdflower.com.

  4. 4.

    http://mallet.cs.umass.edu.

  5. 5.

    http://github.com/minghui/Twitter-LDA.

  6. 6.

    http://goo.gl/JtzJDz.

  7. 7.

    http://semanticsimilarity.org.

  8. 8.

    Part of the order matches the order from humans.

References

  1. Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of SOMA (2010)

    Google Scholar 

  2. Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)

    Google Scholar 

  4. Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of SIGKDD (2007)

    Google Scholar 

  5. Fang, A., Ounis, I., Habel, P., Macdonald, C., Limsopatham, N.: Topic-centric classification of Twitter user’s political orientation. In: Proceedings of SIGIR (2015)

    Google Scholar 

  6. AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Proceedings of ECMLPKDD (2009)

    Google Scholar 

  7. Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Proceedings of ADCS (2009)

    Google Scholar 

  8. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Proceedings of NAACL (2010)

    Google Scholar 

  9. Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML (2006)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. Li, W., Blei, D., McCallum, A.: Nonparametric bayes pachinko allocation. In: Proceedings of UAI (2007)

    Google Scholar 

  12. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of ICML (2009)

    Google Scholar 

  13. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Proceedings of NIPS (2009)

    Google Scholar 

  14. Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)

    MATH  Google Scholar 

  15. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet Electr. Lexical Database 49(2), 265–283 (1998)

    Google Scholar 

  16. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ICRCL (1997)

    Google Scholar 

  17. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of SIGDOC (1986)

    Google Scholar 

  18. Rus, V., Lintean, M.C., Banjade, R., Niraula, N.B., Stefanescu, D.: SEMILAR: the semantic similarity toolkit. In: Proceedings of ACL (2013)

    Google Scholar 

  19. Recchia, G., Jones, M.N.: More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis. Behav. Res. Meth. 41(3), 647–656 (2009)

    Article  Google Scholar 

  20. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  21. Stefănescu, D., Banjade, R., Rus, V.: Latent semantic analysis models on wikipedia and TASA. In: Proceedings of LREC (2014)

    Google Scholar 

  22. Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Mackie, S., McCreadie, R., Macdonald, C., Ounis, I.: On choosing an effective automatic evaluation metric for microblog summarisation. In: Proceedings of IIiX (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anjie Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fang, A., Macdonald, C., Ounis, I., Habel, P. (2016). Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_36

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics