Abstract
Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that many hashtags are not recorded in Wikipedia.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Part of the order matches the order from humans.
References
Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of SOMA (2010)
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011)
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of SIGKDD (2007)
Fang, A., Ounis, I., Habel, P., Macdonald, C., Limsopatham, N.: Topic-centric classification of Twitter user’s political orientation. In: Proceedings of SIGIR (2015)
AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Proceedings of ECMLPKDD (2009)
Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Proceedings of ADCS (2009)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Proceedings of NAACL (2010)
Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Li, W., Blei, D., McCallum, A.: Nonparametric bayes pachinko allocation. In: Proceedings of UAI (2007)
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of ICML (2009)
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Proceedings of NIPS (2009)
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet Electr. Lexical Database 49(2), 265–283 (1998)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ICRCL (1997)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of SIGDOC (1986)
Rus, V., Lintean, M.C., Banjade, R., Niraula, N.B., Stefanescu, D.: SEMILAR: the semantic similarity toolkit. In: Proceedings of ACL (2013)
Recchia, G., Jones, M.N.: More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis. Behav. Res. Meth. 41(3), 647–656 (2009)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284 (1998)
Stefănescu, D., Banjade, R., Rus, V.: Latent semantic analysis models on wikipedia and TASA. In: Proceedings of LREC (2014)
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)
Mackie, S., McCreadie, R., Macdonald, C., Ounis, I.: On choosing an effective automatic evaluation metric for microblog summarisation. In: Proceedings of IIiX (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Fang, A., Macdonald, C., Ounis, I., Habel, P. (2016). Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)