research-article

Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization

Authors:
Ankan Saha

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Vikas Sindhwani

IBM T.J. Watson Research Center, Yorktown, NY, USA

IBM T.J. Watson Research Center, Yorktown, NY, USA
View Profile

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data miningFebruary 2012Pages 693–702https://doi.org/10.1145/2124295.2124376

Published:08 February 2012Publication History

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

Pages 693–702

ABSTRACT

As massive repositories of real-time human commentary, social media platforms have arguably evolved far beyond passive facilitation of online social interactions. Rapid analysis of information content in online social media streams (news articles, blogs,tweets etc.) is the need of the hour as it allows business and government bodies to understand public opinion about products and policies. In most of these settings, data points appear as a stream of high dimensional feature vectors. Guided by real-world industrial deployment scenarios, we revisit the problem of online learning of topics from streaming social media content. On one hand, the topics need to be dynamically adapted to the statistics of incoming datapoints, and on the other hand, early detection of rising new trends is important in many applications. We propose an online nonnegative matrix factorizations framework to capture the evolution and emergence of themes in unstructured text under a novel temporal regularization framework. We develop scalable optimization algorithms for our framework, propose a new set of evaluation metrics, and report promising empirical results on traditional TDT tasks as well as streaming Twitter data. Our system is able to rapidly capture emerging themes, track existing topics over time while maintaining temporal consistency and continuity in user views, and can be explicitly configured to bound the amount of information being presented to the user.

References

J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publ, 2002. Google ScholarDigital Library
L. AlSumait, D. Barbara, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams. In ICDM, 2008. Google ScholarDigital Library
D. Bertsekas. Non-linear Programming. Athena Scientific, 1999.Google Scholar
D. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarDigital Library
D. Blei and M.Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
Tzu-Chuan Chou and Meng Chang Chen. Using Incremental PLSI for Treshhold-Resilient Online Event Analysis. IEEE transactions on Knowledge and Data Engineering, 2008. Google ScholarDigital Library
A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Non-negative and Tensor Factorizations: Applications to Exploratory Multiway Data Analysis and Blind Source Separation. Wiley, 2009 Google ScholarDigital Library
Margaret Connell, Ao Feng, Giridhar Kumaran, Hema Raghavan, Chirag Shah, and James Allan. UMass at TDT 2004. 2004.Google Scholar
Aron Culotta. Towards detecting influenza epidemics by analyzing twitter messages, 2010.Google Scholar
C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorizations and probabilistic latent semantic analysis. Computational Statistics and Data Analysis, 2008. Google ScholarDigital Library
M. Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, 2010. Google ScholarDigital Library
Mark Girolami and A. Kaban. On an equivalence between plsi and lda. SIGIR. Google ScholarDigital Library
A. Gohr, A. Hinneburg, R. Schult, and M. Spiliopoulou. Topic evolution in a stream of documents. In SDM, 2009.Google ScholarCross Ref
Ngoc-Diep Ho, Paul Van Dooren, and Vincent D. Blondel. Descent methods for nonnegative matrix factorization. Numerical Linear Algebra in Signals, abs/0801.3199, 2007.Google Scholar
Matthew D. Hoffman, David M. Blei, and Frances Bach. Online learning for latent dirichlet allocation. In NIPS, 2010.Google ScholarDigital Library
T. Hoffman. Probabilistic latent semantic analysis. In UAI, 1999.Google Scholar
M. Jaggi and M. Sulovský. A simple algorithm for nuclear norm regularized problems. In ICML, 2010.Google ScholarDigital Library
D. Lee and H.S. Seung. Learning the parts of objects using non-negative matrix factorizations. Nature, 1999.Google Scholar
C. J. Lin. Projected gradient methods for non-negative matrix factorization. In Neural Computation, 2007. Google ScholarDigital Library
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. JMLR, 2010. Google ScholarDigital Library
P. Melville, V. Sindhwani, and R. Lawrence. Social media analytics: Channeling the power of the blogosphere for marketing insight. Workshop on Information in Networks, 2009.Google Scholar
P. M. Pardalos and N. Kovoor. An algorithm for singly constrained class of quadratic programs subject to upper and lower bounds. Mathematical Programming, 46:321--328, 1990. Google ScholarDigital Library
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989. Google ScholarDigital Library
Furu Wei, Shimie Pan, Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan, Qiang Zhang, Shixia Liu, Yangqiu Song. Tiara: Visually analyzing topic evolution in large text collections. In KDD, 2010.Google Scholar
Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization. In SIGIR, 2003. Google ScholarDigital Library
Yiming Yang, Tom Pierce, and James Carbonell. A Study on Retrospective and Online Event Detection. In SIGIR, 1998. Google ScholarDigital Library

Index Terms

Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Detecting bursts in sentiment-aware topics from social media

Nowadays plenty of user-generated posts, e.g., sina weibos, are published on the social media. The posts contain the publics sentiments (i.e., positive or negative) towards various topics. Bursty sentiment-aware topics from these posts reveal sentiment-...
Read More
What's Hot in The Theme: Query Dependent Emerging Topic Extraction from Social Streams
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Analyzing emerging topics from social media enables users to overview social movement and several web services to adopt current trends. Although existing studies mainly focus on extracting global emerging topics, efficient extraction of local ones ...
Read More
Analysing Emerging Topics across Multiple Social Media Platforms
ACSW '19: Proceedings of the Australasian Computer Science Week Multiconference

The ability to compose emerging topics from the data collected from multiple social media platforms can help individuals and organisations meet their business goals and improve decision-making, as such information can provide more complete and accurate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining
February 2012
792 pages
ISBN:9781450307475
DOI:10.1145/2124295
General Chairs:
Eytan Adar
University of Michigan, USA
,
Jaime Teevan
Microsoft Research, USA
,
Program Chairs:
Eugene Agichtein
Emory University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dictionary learning
nmf
time series analysis
topic models
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 90
  Total Citations
  View Citations
- 1,270
  Total Downloads
- Downloads (Last 12 months)78
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting bursts in sentiment-aware topics from social media

What's Hot in The Theme: Query Dependent Emerging Topic Extraction from Social Streams

Analysing Emerging Topics across Multiple Social Media Platforms