ABSTRACT
Topic Detection and Tracking is a popular topic clustering method in the big data age, which aims at automatic recognition of new topics and continuous tracking of known topics in news information flow. Traditional Topic Detection and Tracking mainly studies short text. With the rapid development of digital devices and communication techniques, the news is going to be longer and richer. So nowadays traditional Topic Detection and Tracking is faced with three problems, first, long news text usually contains multiple topics, so traditional clustering algorithm cannot accurately identify them. Second, traditional clustering mostly uses multi-dimensional computation based on word bag, but the time-consuming of this multi-dimensional computation increases exponentially with the increase of the length and number of articles. Third, long-text news contains more information. How to show the continuity and relevance of long-text news in a better way is very important and meaningful. Therefore, an improved clustering algorithm based on single-pass is presented in this paper, which can solve the above problems primly. Experiments show that, compared with K-means clustering algorithm, agglomerative hierarchical clustering algorithm, Density-Based Spatial Clustering of Applications with Noise and hierarchical clustering on the constructed concept graph, the accuracy of this algorithm is improved by about 20% to 30%, the recall rate is increased by 10% to 20%, and the algorithm time is reduced by more than 40%. With the increase of the number of articles, the time-consuming curve of the improved single-pass clustering algorithm approximates a linear function. For each additional article, the time required for the algorithm is only 0.1-0.5 times that of other algorithms. Besides, by adding timelines and extracting topics in the theme during presentation, the algorithm can effectively mine the continuity and relevance information of news topics and track the changes of news topics.
- R. Swan and J. Allan, Automatic generation of overview timelines, In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. Google ScholarDigital Library
- J. Allan, 2002 Introduction to topic detection and tracking, Topic detection and tracking. Springer, Boston, MA. Google ScholarDigital Library
- C. Cieri, S. Strassel, D. Graff, N. Martey, K. Rennert and M. Liberman (2002). Corpora for Topic Detection and Tracking. Information Retrieva, 12, 33--66. Google ScholarDigital Library
- Chen, Y., Liu, L., Development and research of Topic Detection and Tracking, In Proceedings of the 7th IEEE International Conference on Software Engineering & Service Science.Google Scholar
- Amayri, O., Bouguila, N., Online news topic detection and tracking via localized feature selection, In Proceedings of the IEEE 2013 International Joint Conference on Neural Networks.Google Scholar
- M. Mohd, F. Crestani, I. Ruthven, Design of an Interface for Interactive Topic Detection and Tracking, In Proceedings of the 8th International Conference on Flexible Query Answering Systems. Google ScholarDigital Library
- Heyer, G., Holz, F., and Teresniak, S., Change of topics over time-tracking topics by their change of meaning, In Proceedings of the 9th Knowledge Discovery and Information Retrieval.Google Scholar
- Li G, Zhang W, Pang J, Huang Q, Jiang S (2013). Online web-video topic detection and tracking with semisupervised learning. Multimedia Systems, 22(1), 115--125. Google ScholarDigital Library
- Yeh J F, Tan Y S, Lee C H (2016). Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation. Neurocomputing, 216, 310--318. Google ScholarDigital Library
- Lu Z, Lin YR, Huang X, Xiong N, Fang Z (2017). Visual topic discovering, tracking and summarization from social media streams. Multimedia Tools and Applications, 76(8), 10855--10879. Google ScholarDigital Library
- Yan, D., Hua, E., and Hu, B., An improved single-pass algorithm for chinese microblog topic detection and tracking, In Proceedings of the IEEE International Congress on Big Data.Google Scholar
- W. Zheng, Y. Zhang, Y. Hong, J. Fan, and T. Liu, Topic tracking based on keywords dependency profile, In Proceedings of the 4th Asia Infomation Retrieval Symposium. Google ScholarDigital Library
- She Y, Tang S, Zhang Q, Indirect Gaussian Graph Learning beyond Gaussianity, In Proceedings of the IEEE Transactions on Network Science and Engineering.Google Scholar
- Huang J, Peng M, Wang H, et al (2016). A probabilistic method for emerging topic tracking in Microblog stream. World Wide Web, 20(2), 325--350. Google ScholarDigital Library
- Li C., Ye Y., Zhang X., et al, Clustering Based Topic Events Detection on Text Stream, In Proceedings of the 5th Asian Conference on Intelligent Information and Database Systems. Google ScholarDigital Library
- Jianping Zeng, Shiyong Zhang (2009). Incorporating Topic Transition in Topic Detection and Tracking Algorithms. Expert Systems with Applications, 36(1), 227--232. Google ScholarDigital Library
- X. Wu, I. Ide and S. Satoh, News Topic Tracking and Re-ranking with Query Expansion Based on Near-Duplicate Detection, In Proceedings of the 10th Pacific Rim Conference on Multimedia. Google ScholarDigital Library
- S. Montalvo, V. Fresno, and R. Martínez (2012). NESM: a Named Entity based Proximity Measure for Multilingual News Clustering. Procesamiento del lenguaje natural, 48, 81--88.Google Scholar
- W. Li, J. Joo, H. Qi, S. Zhu, Joint image-text news topic detection and tracking by multimodal topic and-or graph, In Proceedings of the IEEE Transactions on Multimedia. Google ScholarDigital Library
- Gaul W, Vincent D (2017). Evaluation of the evolution of relationships between topics over time. Advances in Data Analysis & Classification, 11(1), 1--20. Google ScholarDigital Library
- Heyer, G., Holz, F., and Teresniak, S., Change of topics over time-tracking topics by their change of meaning, In Proceedings of In Proceedings of the 9th Knowledge Discovery and Information Retrieval.Google Scholar
- Biao Wang, Yiwei Zhang, Ding Wang, Research on a New Metadata Model of Political Event Data Set, In Proceedings of the 4th International Conference on Big Data Security on Cloud.Google Scholar
- Biao Wang, Ding Wang, Yingchu Xie, Research on the Construction and Application of Burma-vietnam's Political Event Data Set, In Proceedings of the 4th International Conference on Big Data Security on Cloud.Google Scholar
- Huang J, Peng M, Wang H, et al (2013). A topic detection approach through hierarchical clustering on concept graph. Applied Mathematics & Information Sciences, 7(6), 2285--2295.Google ScholarCross Ref
Index Terms
- An Improved Clustering Algorithm based on Single-pass
Recommendations
Improved Text Clustering Algorithm and Application in Microblogging Public Opinion Analysis
WCSE '13: Proceedings of the 2013 Fourth World Congress on Software EngineeringBased on K-Means algorithm and agglomerative hierarchical clustering algorithm, improvement was made regarding the use of clustering algorithm in the application of text mining. It was verified that the accuracy and efficiency of hot topic detection had ...
Topic discovery based on text mining techniques
In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a ...
Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis
APSCC '14: Proceedings of the 2014 Asia-Pacific Services Computing ConferenceThis paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in ...
Comments