skip to main content
10.1145/2020408.2020551acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

A time-dependent topic model for multiple text streams

Published:21 August 2011Publication History

ABSTRACT

In recent years social media have become indispensable tools for information dissemination, operating in tandem with traditional media outlets such as newspapers, and it has become critical to understand the interaction between the new and old sources of news. Although social media as well as traditional media have attracted attention from several research communities, most of the prior work has been limited to a single medium. In addition temporal analysis of these sources can provide an understanding of how information spreads and evolves. Modeling temporal dynamics while considering multiple sources is a challenging research problem. In this paper we address the problem of modeling text streams from two news sources - Twitter and Yahoo! News. Our analysis addresses both their individual properties (including temporal dynamics) and their inter-relationships. This work extends standard topic models by allowing each text stream to have both local topics and shared topics. For temporal modeling we associate each topic with a time-dependent function that characterizes its popularity over time. By integrating the two models, we effectively model the temporal dynamics of multiple correlated text streams in a unified framework. We evaluate our model on a large-scale dataset, consisting of text streams from both Twitter and news feeds from Yahoo! News. Besides overcoming the limitations of existing models, we show that our work achieves better perplexity on unseen data and identifies more coherent topics. We also provide analysis of finding real-world events from the topics obtained by our model.

References

  1. A. Ahmed and E. P. Xing. Timeline: A dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In Proceedings of the 26th International Conference on Conference on Uncertainty in Artificial Intelligence (UAI), pages 20--29, 2010.Google ScholarGoogle Scholar
  2. A. Aji and E. Agichtein. Deconstructing interaction dynamics in knowledge sharing communities. In International Conference on Social Computing, Behavioral Modeling, and Prediction, pages 273--281, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Alsumait, D. Barbará, J. Gentle, and C. Domeniconi. Topic significance ranking of LDA generative models. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pages 67--82, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 113--120, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In NIPS, pages 241--248, 2006.Google ScholarGoogle Scholar
  7. G. Doyle and C. Elkan. Accounting for burstiness in topic models. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 281--288, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, P. Alken, M. Booth, and F. Rossi. GNU Scientific Library Reference Manual - Third Edition (v1.12). Network Theory Ltd., 2009. http://www.gnu.org/software/gsl/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Goetz, J. Leskovec, M. McGlohon, and C. Faloutsos. Modeling blog dynamics. In International AAAI Conference on Weblogs and Social Media (ICWSM), 2009.Google ScholarGoogle Scholar
  10. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, pages 5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  11. T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42:177--196, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Iwata, T. Yamada, Y. Sakurai, and N. Ueda. Online multiscale dynamic topic models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 663--672, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Java, X. Song, T. Finin, and B. Tseng. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD, pages 56--65, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Kleinberg. Bursty and hierarchical structure in streams. Journal Data Mining and Knowledge Discovery, 7(4):373--397, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kleinberg. Temporal dynamics of on-line information streams. In Data Stream Management: Processing High-Speed Data Streams, 2005.Google ScholarGoogle Scholar
  16. J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 497--506, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(3):503--528, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Masada, D. Fukagawa, A. Takasu, T. Hamada, Y. Shibata, and K. Oguri. Dynamic hyperparameter optimization for Bayesian topical trend analysis. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM), pages 1831--1834, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. P. Minka. Estimating a Dirichlet distribution. Technical report, 2009. http://research.microsoft.com/en-us/um /people/minka/papers/dirichlet/.Google ScholarGoogle Scholar
  20. R. M. Nallapati, S. Ditmore, J. D. Lafferty, and K. Ung. Multiscale topic tomography. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 520--529, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. The Journal of Machine Learning Research, 10:1801--1828, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Paul. Cross-collection topic models: Automatically comparing and contrasting text. Master's thesis, UIUC, 2009.Google ScholarGoogle Scholar
  23. M. Paul and R. Girju. Cross-cultural analysis of blogs and forums with mixed-collection topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1408--1417. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Pruteanu-Malinici, L. Ren, J. Paisley, E. Wang, and L. Carin. Hierarchical Bayesian modeling of topics in time-stamped documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:996--1011, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Wang, D. M. Blei, and D. Heckerman. Continuous time dynamic topic models. In Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (UAI), pages 579--586, 2008.Google ScholarGoogle Scholar
  26. X. Wang and A. McCallum. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 424--433, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 784--793, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Wang, K. Zhang, X. Jin, and D. Shen. Mining common topics from multiple asynchronous text streams. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM), pages 192--201, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pages 178--185, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Yang and J. Leskovec. Patterns of temporal variation in online media. In Proceedings of the fourth ACM International Conference on Web search and Data Mining (WSDM), pages 177--186, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zhang, Y. Song, C. Zhang, and S. Liu. Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1079--1088, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing Twitter and traditional media using topic models. In ECIR, pages 338--349, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A time-dependent topic model for multiple text streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2011
        1446 pages
        ISBN:9781450308137
        DOI:10.1145/2020408

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 August 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader