Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

Fang, Anjie; Macdonald, Craig; Ounis, Iadh; Habel, Philip

doi:10.1007/978-3-319-30671-1_36

Anjie Fang²¹,
Craig Macdonald²¹,
Iadh Ounis²¹ &
…
Philip Habel²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4943 Accesses
16 Citations

Abstract

Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Time-Sensitive Topic Derivation in Twitter

Twitter Topic Analysis Using Multi-tweet Sequential Summarization for Sentimental Data

Evaluating Similarity Metrics for Latent Twitter Topics

Notes

1.
Note that many hashtags are not recorded in Wikipedia.
2.
http://dev.twitter.com.
3.
http://crowdflower.com.
4.
http://mallet.cs.umass.edu.
5.
http://github.com/minghui/Twitter-LDA.
6.
http://goo.gl/JtzJDz.
7.
http://semanticsimilarity.org.
8.
Part of the order matches the order from humans.

References

Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of SOMA (2010)
Google Scholar
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011)
Chapter Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)
Google Scholar
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of SIGKDD (2007)
Google Scholar
Fang, A., Ounis, I., Habel, P., Macdonald, C., Limsopatham, N.: Topic-centric classification of Twitter user’s political orientation. In: Proceedings of SIGIR (2015)
Google Scholar
AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Proceedings of ECMLPKDD (2009)
Google Scholar
Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Proceedings of ADCS (2009)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Proceedings of NAACL (2010)
Google Scholar
Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML (2006)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Li, W., Blei, D., McCallum, A.: Nonparametric bayes pachinko allocation. In: Proceedings of UAI (2007)
Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of ICML (2009)
Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Proceedings of NIPS (2009)
Google Scholar
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
MATH Google Scholar
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet Electr. Lexical Database 49(2), 265–283 (1998)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ICRCL (1997)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of SIGDOC (1986)
Google Scholar
Rus, V., Lintean, M.C., Banjade, R., Niraula, N.B., Stefanescu, D.: SEMILAR: the semantic similarity toolkit. In: Proceedings of ACL (2013)
Google Scholar
Recchia, G., Jones, M.N.: More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis. Behav. Res. Meth. 41(3), 647–656 (2009)
Article Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284 (1998)
Article Google Scholar
Stefănescu, D., Banjade, R., Rus, V.: Latent semantic analysis models on wikipedia and TASA. In: Proceedings of LREC (2014)
Google Scholar
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)
Chapter Google Scholar
Mackie, S., McCreadie, R., Macdonald, C., Ounis, I.: On choosing an effective automatic evaluation metric for microblog summarisation. In: Proceedings of IIiX (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Glasgow, Glasgow, UK
Anjie Fang, Craig Macdonald, Iadh Ounis & Philip Habel

Authors

Anjie Fang
View author publications
You can also search for this author in PubMed Google Scholar
Craig Macdonald
View author publications
You can also search for this author in PubMed Google Scholar
Iadh Ounis
View author publications
You can also search for this author in PubMed Google Scholar
Philip Habel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anjie Fang .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, A., Macdonald, C., Ounis, I., Habel, P. (2016). Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics