ABSTRACT
As massive repositories of real-time human commentary, social media platforms have arguably evolved far beyond passive facilitation of online social interactions. Rapid analysis of information content in online social media streams (news articles, blogs,tweets etc.) is the need of the hour as it allows business and government bodies to understand public opinion about products and policies. In most of these settings, data points appear as a stream of high dimensional feature vectors. Guided by real-world industrial deployment scenarios, we revisit the problem of online learning of topics from streaming social media content. On one hand, the topics need to be dynamically adapted to the statistics of incoming datapoints, and on the other hand, early detection of rising new trends is important in many applications. We propose an online nonnegative matrix factorizations framework to capture the evolution and emergence of themes in unstructured text under a novel temporal regularization framework. We develop scalable optimization algorithms for our framework, propose a new set of evaluation metrics, and report promising empirical results on traditional TDT tasks as well as streaming Twitter data. Our system is able to rapidly capture emerging themes, track existing topics over time while maintaining temporal consistency and continuity in user views, and can be explicitly configured to bound the amount of information being presented to the user.
- J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publ, 2002. Google ScholarDigital Library
- L. AlSumait, D. Barbara, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams. In ICDM, 2008. Google ScholarDigital Library
- D. Bertsekas. Non-linear Programming. Athena Scientific, 1999.Google Scholar
- D. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarDigital Library
- D. Blei and M.Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- Tzu-Chuan Chou and Meng Chang Chen. Using Incremental PLSI for Treshhold-Resilient Online Event Analysis. IEEE transactions on Knowledge and Data Engineering, 2008. Google ScholarDigital Library
- A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Non-negative and Tensor Factorizations: Applications to Exploratory Multiway Data Analysis and Blind Source Separation. Wiley, 2009 Google ScholarDigital Library
- Margaret Connell, Ao Feng, Giridhar Kumaran, Hema Raghavan, Chirag Shah, and James Allan. UMass at TDT 2004. 2004.Google Scholar
- Aron Culotta. Towards detecting influenza epidemics by analyzing twitter messages, 2010.Google Scholar
- C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorizations and probabilistic latent semantic analysis. Computational Statistics and Data Analysis, 2008. Google ScholarDigital Library
- M. Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, 2010. Google ScholarDigital Library
- Mark Girolami and A. Kaban. On an equivalence between plsi and lda. SIGIR. Google ScholarDigital Library
- A. Gohr, A. Hinneburg, R. Schult, and M. Spiliopoulou. Topic evolution in a stream of documents. In SDM, 2009.Google ScholarCross Ref
- Ngoc-Diep Ho, Paul Van Dooren, and Vincent D. Blondel. Descent methods for nonnegative matrix factorization. Numerical Linear Algebra in Signals, abs/0801.3199, 2007.Google Scholar
- Matthew D. Hoffman, David M. Blei, and Frances Bach. Online learning for latent dirichlet allocation. In NIPS, 2010.Google ScholarDigital Library
- T. Hoffman. Probabilistic latent semantic analysis. In UAI, 1999.Google Scholar
- M. Jaggi and M. Sulovský. A simple algorithm for nuclear norm regularized problems. In ICML, 2010.Google ScholarDigital Library
- D. Lee and H.S. Seung. Learning the parts of objects using non-negative matrix factorizations. Nature, 1999.Google Scholar
- C. J. Lin. Projected gradient methods for non-negative matrix factorization. In Neural Computation, 2007. Google ScholarDigital Library
- J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. JMLR, 2010. Google ScholarDigital Library
- P. Melville, V. Sindhwani, and R. Lawrence. Social media analytics: Channeling the power of the blogosphere for marketing insight. Workshop on Information in Networks, 2009.Google Scholar
- P. M. Pardalos and N. Kovoor. An algorithm for singly constrained class of quadratic programs subject to upper and lower bounds. Mathematical Programming, 46:321--328, 1990. Google ScholarDigital Library
- G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989. Google ScholarDigital Library
- Furu Wei, Shimie Pan, Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan, Qiang Zhang, Shixia Liu, Yangqiu Song. Tiara: Visually analyzing topic evolution in large text collections. In KDD, 2010.Google Scholar
- Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization. In SIGIR, 2003. Google ScholarDigital Library
- Yiming Yang, Tom Pierce, and James Carbonell. A Study on Retrospective and Online Event Detection. In SIGIR, 1998. Google ScholarDigital Library
Index Terms
- Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization
Recommendations
Detecting bursts in sentiment-aware topics from social media
Nowadays plenty of user-generated posts, e.g., sina weibos, are published on the social media. The posts contain the publics sentiments (i.e., positive or negative) towards various topics. Bursty sentiment-aware topics from these posts reveal sentiment-...
What's Hot in The Theme: Query Dependent Emerging Topic Extraction from Social Streams
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebAnalyzing emerging topics from social media enables users to overview social movement and several web services to adopt current trends. Although existing studies mainly focus on extracting global emerging topics, efficient extraction of local ones ...
Analysing Emerging Topics across Multiple Social Media Platforms
ACSW '19: Proceedings of the Australasian Computer Science Week MulticonferenceThe ability to compose emerging topics from the data collected from multiple social media platforms can help individuals and organisations meet their business goals and improve decision-making, as such information can provide more complete and accurate ...
Comments