Abstract
Analyzing topic evolution is an effective way to monitor the overview of topic spreading. Existing methods have focused either on the intensity evolution of topics along a timeline or the topic evolution path of technical literature. In this paper, we aim to study topic evolution from a micro perspective, which not only captures the topic timeline but also reveals the topic status and the directed evolutionary path among topics. Firstly, we construct a word network by co-occurrence relationship between feature words. Secondly, Latent Dirichlet allocation (LDA) model is used to automatically extract topics and capture the mapping relationship between words and topics, and then a ‘word-topic’ coupling network is built. Thirdly, based on the ‘word-topic’ coupling network, we describe the topic intensity evolution over time and measure topic status considering the contribution of feature words to a topic. The concept of topic drifting probability is proposed to identify the evolutionary path. Experimental results conducted on two real-world data sets of “COVID-19” demonstrate the effectiveness of our proposed method.
















Similar content being viewed by others
References
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Blei, D., & Lafferty, J. (2006a). Correlated Topic Models. Neural Information Processing Systems, 18, 147.
Blei, D. M., & Lafferty, J. D. (2006b). Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning. 113–120
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205.
Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017). Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.
Chen, J., Gong, Z., & Liu, W. (2019). A nonparametric model for online topic discovery with word embeddings. Information Sciences, 504, 32–47.
Chen, W., Lin, C., Li, J., & Yang, Z. (2018). Analysis of the evolutionary trend of technical topics in patents based on lda and hmm: Taking marine diesel engine technology as an example. Journal of the China Society for Entific and Technical Information, 37, 731–742.
Du, Y., Yi, Y., Li, X., Chen, X., Fan, Y., & Su, F. (2020). Extracting and tracking hot topics of micro-blogs based on improved latent dirichlet allocation. Engineering Applications of Artificial Intelligence, 87, 103279.
Fang, M., Chen, Y., Gao, P., Zhao, S., & Zheng, S. (2014). Topic trend prediction based on wavelet transformation. In 2014 11th Web Information System and Application Conference. 157–162. IEEE
Gao, W., Peng, M., Wang, H., Zhang, Y., Han, W., Hu, G., & Xie, Q. (2020). Generation of topic evolution graphs from short text streams. Neurocomputing, 383, 282–294.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1–2), 177–196.
Hurtado, J. L., Agarwal, A., & Zhu, X. (2016). Topic discovery and future trend forecasting for texts. Journal of Big Data, 3(1), 7.
Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS ONE, 9(6), e98679.
Jian, F., Yajiao, W., & Yuanyuan, D. (2018). Microblog topic evolution computing based on LDA algorithm. Open Physics, 16(1), 509–516.
Jung, S., & Yoon, W. C. (2020). An alternative topic model based on common interest authors for topic evolution analysis. Journal of Informetrics, 14(3), 101040.
Kim, S., Park, H., & Lee, J. (2020). Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: a study on blockchain technology trend analysis. Expert Systems with Applications, 152, 113401.
Liu, W., Deng, Z. H., Gong, X., Jiang, F., & Tsang, I. W. (2015). Effectively predicting whether and when a topic will become prevalent in a social network. In Proceedings of the National Conference on Artificial Intelligence
Liu, Z., Wang, X., & Bai, R. (2017). Construction and empirical research on multi-dimensional topic evolution analysis model. Information Studies: Theory & Application, 3, 18.
Lopez, C. E., & Gallemore, C. (2021). An augmented multilingual Twitter dataset for studying the COVID-19 infodemic. Social Network Analysis and Mining, 11(1), 1–14.
Manning, C. D., Schütze, H., & Raghavan, P. (2008). Introduction to information retrieval. Cambridge University Press.
Miao, Z., Du, J., Dong, F., Liu, Y., & Wang, X. (2020). Identifying technology evolution pathways using topic variation detection based on patent data: A case study of 3D printing. Futures, 118, 102530.
Song, Y., Li, A., & Quan, Y. (2018). Topics' popularity prediction based on ARMA model. In Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence. 68–72
Stein, B., & Zu Eissen, S. M. (2004). Topic identification: Framework and application. In Proceedings of the International Conference on Knowledge Management. 522–531
Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012, July). Exploring topic coherence over many models and many topics. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 952–961
Wang, X., & McCallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 424–433
Wang, C., Blei, D., & Heckerman, D. (2008). Continuous time dynamic topic models. In Uncertainty in Artificial Intelligence. Helsinki.
Wartena, C., & Brussee, R. (2008). Topic detection by clustering keywords. In 2008 19th International Workshop on Database and Expert Systems Applications. 54–58. IEEE
Wei, L., Jiamin, W., & Jiming, H. (2020). Analyzing the topic distribution and evolution of foreign relations from parliamentary debates: A framework and case study. Information Processing & Management, 57(3), 102191.
Whye Teh, Y., Jordan, M. I., Beal, M. J., & Blei, D. M. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. In NIPS’04 Proceedings of the 17th International Conference on Neural Information Processing Systems. 1385–1392
Wu, H., Yi, H., & Li, C. (2021). An integrated approach for detecting and quantifying the topic evolutions of patent technology: A case study on graphene field. Scientometrics, 126(8), 6301–6321.
Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611–620.
Xu, H., Winnink, J., Yue, Z., Liu, Z., & Yuan, G. (2020). Topic-linked innovation paths in science and technology. Journal of Informetrics, 14(2), 101014.
Zhang, Y., Mao, W., & Lin, J. (1991). Modeling topic evolution in social media short texts. In 2017 IEEE International Conference on Big Knowledge (ICBK). 315–319. IEEE
Zhao, J., Wu, W., Zhang, X., Qiang, Y., Liu, T., & Wu, L. (2014). A short-term trend prediction model of topic over Sina Weibo dataset. Journal of Combinatorial Optimization, 28(3), 613–625.
Zhou, H., Yu, H., & Hu, R. (2017). Topic evolution based on the probabilistic topic model: A review. Frontiers of Computer Science, 11(5), 786–802.
Zhu, J., Li, X., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R., & Liu, P. (2015). Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. International Conference on Web-Age Information Management. Springer.
Acknowledgements
This work is supported by the National Natural Science Foundation of China [grant numbers 71874088, 71704085]; the Cultivation Base of Excellent Innovation Team in Philosophy & Social Sciences in Jiangsu Universities [grant number 2017ZSTD022]; Postgraduate Research & Practice Innovation Program of Jiangsu Province [grant number KYCX20_0840].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Rights and permissions
About this article
Cite this article
Zhu, H., Qian, L., Qin, W. et al. Evolution analysis of online topics based on ‘word-topic’ coupling network. Scientometrics 127, 3767–3792 (2022). https://doi.org/10.1007/s11192-022-04439-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04439-x