A Temporal Topic Model for Noisy Mediums

Churchill, Rob; Singh, Lisa; Kirov, Christo

doi:10.1007/978-3-319-93037-4_4

Rob Churchill¹⁹,
Lisa Singh¹⁹ &
Christo Kirov¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2127 Accesses
9 Citations

Abstract

Social media and online news content are increasing rapidly. The goal of this work is to identify the topics associated with this content and understand the changing dynamics of these topics over time. We propose Topic Flow Model (TFM), a graph theoretic temporal topic model that identifies topics as they emerge, and tracks them through time as they persist, diminish, and re-emerge. TFM identifies topic words by capturing the changing relationship strength of words over time, and offers solutions for dealing with flood words, i.e., domain specific words that pollute topics. An extensive empirical analysis of TFM on Twitter data, newspaper articles, and synthetic data shows that the topic accuracy and SNR of meaningful topic words are better than the existing state.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Another way to simulate this is to sample from a Zipfian distribution. Our data generator allows for distribution changes. For these experiments, we create a mixture that is noisier and harder to generate topics from than a Zipfian sample.

References

de Arruda, H.F., da Fontoura Costa, L., Amancio, D.R.: Topic segmentation via community detection in complex networks. CoRR (2015). http://arxiv.org/abs/1512.01384
Bhadury, A., Chen, J., Zhu, J., Liu, S.: Scaling up dynamic topic models. In: WWW. SIAM (2016)
Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML. IEEE (2006)
Google Scholar
Blei, D.M., Lafferty, J.D.: Visualizing topics with multi-word expressions. arXiv e-prints (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008(10), P10008 (2008)
Article Google Scholar
Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: MDM-KDD. ACM (2010)
Google Scholar
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS. AAAI (2009)
Google Scholar
InternetLiveStats: Twitter usage statistics. http://www.internetlivestats.com/twitter-statistics/. Accessed 05 May 2017
Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM. ACM (2011)
Google Scholar
Lafferty, J.D., Blei, D.M.: Correlated topic models. In: NIPS, pp. 147–154. AAAI (2006)
Google Scholar
Noyes, D.: The top 20 valuable Facebook statistics - updated May 2017. https://zephoria.com/top-15-valuable-facebook-statistics/. Accessed 05 May 2017
Shahnaz, F., Berry, M.W., Pauca, V., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manage. 42, 373–386 (2006)
Article Google Scholar
Sleeman, J., Halem, M., Finin, T., Cane, M., et al.: Modeling the evolution of climate change assessment research using dynamic topic models and cross-domain divergence maps. In: Symposium on AI for Social Good. AAAI (2016)
Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Article MathSciNet Google Scholar
Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: ICWSM. AAAI (2017)
Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD. ACM (2006)
Google Scholar
Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: SDM. SIAM (2013)
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the Massive Data Institute (MDI) at Georgetown University.

Author information

Authors and Affiliations

Georgetown University, Washington, D.C., USA
Rob Churchill, Lisa Singh & Christo Kirov

Authors

Rob Churchill
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Singh
View author publications
You can also search for this author in PubMed Google Scholar
Christo Kirov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rob Churchill .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Churchill, R., Singh, L., Kirov, C. (2018). A Temporal Topic Model for Noisy Mediums. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-93037-4_4
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics