Ranking and tagging bursty features in text streams with context language models

Zhao, Wayne Xin; Liu, Chen; Wen, Ji-Rong; Li, Xiaoming

doi:10.1007/s11704-016-5144-z

Ranking and tagging bursty features in text streams with context language models

Research Article
Published: 29 June 2016

Volume 11, pages 852–862, (2017)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Wayne Xin Zhao^1,2,
Chen Liu³,
Ji-Rong Wen^1,2 &
…
Xiaoming Li⁴

64 Accesses
Explore all metrics

Abstract

Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context.We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Framework for Discovering Bursty Events and Their Relationships from Online News Articles

Online Hot Topic Detection from Web News Based on Bursty Term Identification

Identifying Evolutionary Topic Temporal Patterns Based on Bursty Phrase Clustering

References

Kleinberg J. Bursty and hierarchical structure in streams. Data Mining Knowledge Discovery, 2003, 7(4): 373–397
Article MathSciNet Google Scholar
Vlachos M, Meek C, Vagena Z, Gunopulos D. Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 131–142
Chapter Google Scholar
Fung G P C, Yu J X, Yu P S, Lu H. Parameter free bursty events detection in text streams. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 181–192
Google Scholar
He Q, Chang K Y, Lim E P. Analyzing feature trajectories for event detection. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 207–214
Google Scholar
He Q, Chang K Y, Lim E P, Zhang J. Bursty feature representation for clustering text streams. In: Proceedings of the 2007 SIAM Conference on Data Mining. 2007, 491–496
Chapter Google Scholar
Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D. On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 477–486
Chapter Google Scholar
Fung G P C, Yu X J, Liu H, Yu P S. Time-dependent event hierarchy construction. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2007, 300–309
Chapter Google Scholar
Parikh N, Sundaresan N. Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 972–980
Chapter Google Scholar
Kumar R, Novak J, Raghavan P, Tomkins A. On the bursty evolution of blogspace. In: Proceedings of the 12th International Conference on World Wide Web. 2003, 568–576
Google Scholar
Wang X H, Zhai C X, Hu X, Sproat R. Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 784–793
Chapter Google Scholar
Jiang Y L, Lin C X, Mei Q Z. Context comparison of bursty events in web search and online media. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010, 1077–1087
Google Scholar
Yao J J, Cui B, Huang Y X, Jin X. Temporal and social context based burst detection from folksonomies. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1474–1479
Google Scholar
Mei Q Z, Xin D, Cheng H, Han JW, Zhai C X. Generating semantic annotations for frequent patterns with context analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 337–346
Chapter Google Scholar
Mei Q Z, Shen X H, Zhai C X. Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 490–499
Chapter Google Scholar
Zhai C X. Statistical language models for information retrieval: a critical review. Foundations and Trends in Information Retrieval, 2008
Google Scholar
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022
MATH Google Scholar
Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2001, 403–410
Google Scholar
Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1917, 39(1): 1–38
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by the National Natural Science Foundation of China (Grant No. 61502502), the National Basic Research Program (973 Program) of China (2014CB340403), Beijing Natural Science Foundation (4162032), and the Open Fund of Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, North China University of Technology, China.

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Wayne Xin Zhao & Ji-Rong Wen
Beijing Key Laboratory of Big Data Management and Analysis Methods, Renmin University of China, Beijing, 100872, China
Wayne Xin Zhao & Ji-Rong Wen
Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing, 100144, China
Chen Liu
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Xiaoming Li

Authors

Wayne Xin Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Chen Liu
View author publications
You can also search for this author inPubMed Google Scholar
Ji-Rong Wen
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoming Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wayne Xin Zhao.

Additional information

Wayne Xin Zhao is currently an assistant professor at the School of Information, Renmin University of China, China. He received the PhD degree from Peking University, China in 2014. He has published several referred papers in international conferences and journals such as ACL, EMNLP, COLING, ECIR, CIKM, SIGIR, SIGKDD, ACM TOIS, ACM TIST, and IEEE TKDE. His research interests are web text mining and natural language processing.

Chen Liu is an associate professor at Research Center for Cloud Computing, North China University of Technology, China. He received his PhD degree in computer science and technology from the Chinese Academy of Sciences, China in 2007. His research interests include data integration, service modeling, service composition, cloud computing and so on.

Ji-Rong Wen is a professor at the School of Information, Renmin University of China, China. Before that, he had been a senior researcher and group manager of the Web Search and Mining Group at MSRA since 2008. He has published extensively on prestigious international conferences/journals and served as program committee members or chairs in many international conferences. He was the chair of the WWW in China track of the 17th World Wide Web conference. He is currently the associate editor of ACM Transactions on Information Systems (TOIS).

Xiaoming Li is a professor at the School of Electronic Engineering and Computer Science and the director of Institute of Network Computing and Information Systems in Peking University, China. He is a senior member of IEEE and currently served as vice president of China Computer Federation. His research interests include search engine and web mining, and web technology enabled social sciences.

Electronic supplementary material

Supplementary material, approximately 229 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, W.X., Liu, C., Wen, JR. et al. Ranking and tagging bursty features in text streams with context language models. Front. Comput. Sci. 11, 852–862 (2017). https://doi.org/10.1007/s11704-016-5144-z

Download citation

Received: 14 April 2015
Accepted: 01 December 2015
Published: 29 June 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11704-016-5144-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ranking and tagging bursty features in text streams with context language models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Framework for Discovering Bursty Events and Their Relationships from Online News Articles

Online Hot Topic Detection from Web News Based on Bursty Term Identification

Identifying Evolutionary Topic Temporal Patterns Based on Bursty Phrase Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 229 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now