skip to main content
10.1145/2124295.2124376acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization

Published:08 February 2012Publication History

ABSTRACT

As massive repositories of real-time human commentary, social media platforms have arguably evolved far beyond passive facilitation of online social interactions. Rapid analysis of information content in online social media streams (news articles, blogs,tweets etc.) is the need of the hour as it allows business and government bodies to understand public opinion about products and policies. In most of these settings, data points appear as a stream of high dimensional feature vectors. Guided by real-world industrial deployment scenarios, we revisit the problem of online learning of topics from streaming social media content. On one hand, the topics need to be dynamically adapted to the statistics of incoming datapoints, and on the other hand, early detection of rising new trends is important in many applications. We propose an online nonnegative matrix factorizations framework to capture the evolution and emergence of themes in unstructured text under a novel temporal regularization framework. We develop scalable optimization algorithms for our framework, propose a new set of evaluation metrics, and report promising empirical results on traditional TDT tasks as well as streaming Twitter data. Our system is able to rapidly capture emerging themes, track existing topics over time while maintaining temporal consistency and continuity in user views, and can be explicitly configured to bound the amount of information being presented to the user.

References

  1. J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publ, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. AlSumait, D. Barbara, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams. In ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bertsekas. Non-linear Programming. Athena Scientific, 1999.Google ScholarGoogle Scholar
  4. D. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Blei and M.Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tzu-Chuan Chou and Meng Chang Chen. Using Incremental PLSI for Treshhold-Resilient Online Event Analysis. IEEE transactions on Knowledge and Data Engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Non-negative and Tensor Factorizations: Applications to Exploratory Multiway Data Analysis and Blind Source Separation. Wiley, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Margaret Connell, Ao Feng, Giridhar Kumaran, Hema Raghavan, Chirag Shah, and James Allan. UMass at TDT 2004. 2004.Google ScholarGoogle Scholar
  9. Aron Culotta. Towards detecting influenza epidemics by analyzing twitter messages, 2010.Google ScholarGoogle Scholar
  10. C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorizations and probabilistic latent semantic analysis. Computational Statistics and Data Analysis, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mark Girolami and A. Kaban. On an equivalence between plsi and lda. SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gohr, A. Hinneburg, R. Schult, and M. Spiliopoulou. Topic evolution in a stream of documents. In SDM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ngoc-Diep Ho, Paul Van Dooren, and Vincent D. Blondel. Descent methods for nonnegative matrix factorization. Numerical Linear Algebra in Signals, abs/0801.3199, 2007.Google ScholarGoogle Scholar
  15. Matthew D. Hoffman, David M. Blei, and Frances Bach. Online learning for latent dirichlet allocation. In NIPS, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Hoffman. Probabilistic latent semantic analysis. In UAI, 1999.Google ScholarGoogle Scholar
  17. M. Jaggi and M. Sulovský. A simple algorithm for nuclear norm regularized problems. In ICML, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lee and H.S. Seung. Learning the parts of objects using non-negative matrix factorizations. Nature, 1999.Google ScholarGoogle Scholar
  19. C. J. Lin. Projected gradient methods for non-negative matrix factorization. In Neural Computation, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. JMLR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Melville, V. Sindhwani, and R. Lawrence. Social media analytics: Channeling the power of the blogosphere for marketing insight. Workshop on Information in Networks, 2009.Google ScholarGoogle Scholar
  22. P. M. Pardalos and N. Kovoor. An algorithm for singly constrained class of quadratic programs subject to upper and lower bounds. Mathematical Programming, 46:321--328, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Furu Wei, Shimie Pan, Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan, Qiang Zhang, Shixia Liu, Yangqiu Song. Tiara: Visually analyzing topic evolution in large text collections. In KDD, 2010.Google ScholarGoogle Scholar
  25. Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization. In SIGIR, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yiming Yang, Tom Pierce, and James Carbonell. A Study on Retrospective and Online Event Detection. In SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining
      February 2012
      792 pages
      ISBN:9781450307475
      DOI:10.1145/2124295

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 February 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader