skip to main content
10.1145/3538712.3538728acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
short-paper

News topic detection based on the principle of minimum entropy

Published: 23 August 2022 Publication History

Abstract

Current topic detection methods generally use different algorithms to aggregate the extracted features to obtain corresponding topics, but do not fully utilize the features of news texts and key elements of news. For this, a news topic detection model based on the principle of minimum entropy (ME-NTD) is proposed. First, we extract news keywords through TextRank, and extract news entity elements with the help of text lexical analysis tools. Then, we map news texts into text vectors through word embedding, calculate text association weights by combining news entity elements, and construct text association graphs through text vectors and text association weights. Finally, we use the idea of hierarchical coding based on the principle of minimum entropy to randomly walk the nodes on the text association graph to achieve hierarchical coding of topics and objects within topics. Experimental results on three news datasets show that the ME-NTD model has better performance and efficiency than the comparative methods.

References

[1]
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 404–411.
[2]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit Association for Computational Linguistics (ACL) System Demonstrations, 55–60. http://www.aclweb.org/anthology/P/P14/P14–5010.
[3]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. ICLR.
[4]
Zhang X, Liu Z, Liu W, Yang J, Fei S. 2013. Chinese event classification for event ontology construction. Journal of Computational Information Systems. Vol. 9, 3511–3519.
[5]
Jaromír Novotný and Pavel Ircing. 2017. Unsupervised document classification and topic detection. In International Conference on Speech and Computer. Springer. Cham. 748–756.
[6]
Maosong Sun, Jingyang Li, Zhipeng Guo, Yu Zhao, Yabin Zheng, Xiance Si and Zhiyuan Liu. 2016. THUCTC: An Efficient Chinese Text Classifier.
[7]
Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving topic models with latent feature word representations. Transactions of the Association for Computational Linguistics, 3, 299–313.
[8]
Guangxu Xun, Yaliang Li, Jing Gao, and Aidong Zhang. 2017. Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts. In KDD.
[9]
Dingcheng Li, Jingyuan Zhang, and Ping Li. 2019. TMSA: A Mutual Learning Model for Topic Discovery and Word Embedding. In SDM. 684–692.
[10]
Qian Chen, Xin Guo, Hexiang Bai. 2017. Semantic-based topic detection using Markov decision processes. Neurocomputing, vol. 242, June 2017, 40–50.
[11]
Kejing Xiao, Zhaopeng Qian, Biao Qin. 2021. A graphical decomposition and similarity measurement approach for topic detection from online news. Information Sciences, 570, 262–277.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '22: Proceedings of the 34th International Conference on Scientific and Statistical Database Management
July 2022
201 pages
ISBN:9781450396677
DOI:10.1145/3538712
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • NULL

Conference

SSDBM 2022

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 35
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media