skip to main content
10.1145/3560442.3560451acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Topic Tracking Algorithm Based on Topic Structure Characteristics

Published: 10 October 2022 Publication History

Abstract

Topic tracking task is used for public opinion monitoring, and its key technology is text classification algorithm. However, existing text classification algorithms need large-scale train corpus during training, while topic tracking task only provides a small amount of train corpus, resulting in that it has poor performance. We analyze story description contents in train corpus, and find that the report description contents of the same topic have the topic structure characteristics of high similarity. We use topic information to represent highly similar topic structure characteristics to make up for the lack of train corpus in text classification algorithm, and fuse topic structure characteristics and text classification algorithm to make the topic tracking algorithm consider topic structure characteristics, and propose Topic Tracking Algorithm based on Topic Structure characteristics (TTATS). To verify its performance, we carry out quantitative and qualitative experiments. The experimental results of multiple dimensions show that it has preferable topic tracking performance.

References

[1]
Alkouz B, Aghbari Z A. Detection and Visualization of Bilingual Trending Topics [J]. Journal of Advances in Information Technology, 2020, 11(2): 71-77.
[2]
Jain S, Duncan B A, Zhang Y, Real-Time Social Network Data Mining for Predicting the Path for a Disaster [J]. 2016, 7(2): 81-87.
[3]
Amara A, Taieb M, Aouicha M B. Multilingual Topic Modeling for Tracking COVID-19 Trends based on Facebook Data Analysis [J]. Applied Intelligence, 2021, 51(5):3052-3073.
[4]
Yang C. Search Engines Information Retrieval in Practice [J]. Journal of the American Society for Information Science & Technology, 2014, 61(2): 430-430
[5]
Mark L. Search Engines: Information Retrieval in Practice [J]. Computer Journal, 2011, 54(5):831-832.
[6]
Beeferman D, Jiang H. Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking [C]. KDD 2021.
[7]
Farid D M, Rahman C M. Mining Complex Data Streams: Discretization, Attribute Selection and Classification [J]. Journal of Advances in Information Technology, 2013, 4(3): 129-135.
[8]
Xiong Y, Zhang Y, Feng S, Event detection and tracking in microblog stream based on multimodal feature deep fusion [J]. Control and Decision, 2019, 34(7): 1409-1416. (in Chinese)
[9]
Martinez-Huertas J, Olmos R, Jorge-Botana G, Distilling vector space model scores for the assessment of constructed responses with bifactor Inbuilt Rubric method and latent variables [J]. Behavior Research Methods, 2022:1-23.
[10]
Tang L, Mu H, Hou A, Research on Network Bad Information Filtering System Based on K Nearest Neighbor Algorithms [J]. Computing Technology and Automation, 2019, 38(04): 172-175. (in Chinese)
[11]
Gao W, Dai S, Tu L, A weighted support vector machines for unbalanced data set with redundant data removing [J]. Changjiang Information and Communications, 2022, 35(01): 46-50. (in Chinese)
[12]
Lavrenko V, Croft W. Relevance-based language models [J]. ACM SIGIR Forum, 2017, 51(2): 260-267.
[13]
Hong Y, Zhang Y, Fan J, New event detection based on division comparison of subtopic [J]. Chinese Journal of Computers, 2008, 31(4):687-695.
[14]
Liu Y, Cao J, Diao X, Survey on stability of feature selection [J]. Journal of Software, 2018, 29(09): 2559-2579.
[15]
Liu Q, Zhang H, Yu H, Chinese lexical analysis using cascaded hidden markov model [J]. Journal of Computer Research and Development, 2004(08): 1421-1429.
[16]
Tan S, Cheng X, Ghanem M M, A novel refinement approach for text categorization [C]. Bremen, Germany, ACM CIKM 2005, 2005: 469-476.
[17]
Srivastava S, Verma H K, Gupta D. On Performance Evaluation of Mining Algorithm for Multiple-Level Association Rules based on Scale-up Characteristics [J]. Journal of Advances in Information Technology, 2011, 2(4): 234-238.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCCT '22: Proceedings of the 2022 6th High Performance Computing and Cluster Technologies Conference
July 2022
68 pages
ISBN:9781450396646
DOI:10.1145/3560442
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Story
  2. Topic
  3. Topic detection and tracking (TDT)
  4. Topic structure characteristics
  5. Topic tracking

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the National Natural Science Foundation of China
  • the Langfang Science and Technology Support Plan

Conference

HPCCT 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 31
    Total Downloads
  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media