Abstract
The combination of the topic model and the semantic method can help to discover the semantic distributions of topics and the changing characteristics of the semantic distributions, further providing a new perspective for the research of topic evolution. This study proposes a solution for quantifying the semantic distributions and the changing characteristics based on words in topic evolution through the Dynamic topic model (DTM) and the word2vec model. A dataset in the field of Library and information science (LIS) is utilized in the empirical study, and the topic-semantic probability distribution is derived. The evolving dynamics of the topics are constructed. The characteristics of evolving dynamics are used to explain the semantic distributions of topics in topic evolution. Then, the regularities of evolving dynamics are summarized to explain the changing characteristics of semantic distributions in topic evolution. Results show that no topic is distributed in a single semantic concept, and most topics correspond to various semantic concepts in LIS. The three kinds of topics in LIS are the convergent, diffusive, and stable topics. The discovery of different modes of topic evolution can further prove the development of the field. In addition, findings indicate that the popularity of topics and the characteristics of evolving dynamics of topics are irrelevant.








Similar content being viewed by others
References
Almeida, M., Souza, R., & Fonseca, F. (2011). Semantics in the Semantic Web: A Critical Evaluation. Knowledge Organization, 38(3), 187–203.
Asghari, M., D. Sierra-Sosa, A. Elmaghraby and Ieee (2018). Trends on Health in Social Media: Analysis using Twitter Topic Modeling. IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, Ieee.
Blei, D. M. and J. D. Lafferty (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning.
Blei, D. M., A. Y. Ng and M. I. Jordan (2003). "Latent dirichlet allocation." the Journal of machine Learning research 3(4–5): 993–1022.
Chang, Y.-W., Huang, M.-H., & Lin, C.-W. (2015). Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses. Scientometrics, 105(3), 2071–2087.
Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017a). Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.
Chen, Q., Ai, N., Liao, J., Shao, X., Liu, Y., & Fan, X. (2017b). Revealing topics and their evolution in biomedical literature using Bio-DTM: A case study of ginseng. Chinese Medicine, 12(1), 1–9.
Chen, B., Ding, Y., & Ma, F. (2018). Semantic word shifts in a scientific domain. Scientometrics, 117(1), 211–226.
Cheng, Q., Wang, J., Lu, W., Huang, Y., & Bu, Y. (2020). Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis. Scientometrics, 124(3), 1923–1943.
Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146–166.
Ding, W., & Chen, C. (2014). Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods. Journal of the Association for Information Science and Technology, 65(10), 2084–2097.
Fu, R. J., J. Guo, B. Qin, W. X. Che, H. F. Wang and T. Liu (2014). Learning Semantic Hierarchies via Word Embeddings. 52nd Annual Meeting of the Association-for-Computational-Linguistics (ACL), Baltimore, MD, Assoc Computational Linguistics-Acl.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.
Hamilton, W. L., J. Leskovec and D. Jurafsky (2016). Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Berlin, GERMANY, Assoc Computational Linguistics-Acl.
Hu, K., Qi, K., Yang, S., Shen, S., Cheng, X., Wu, H., Zheng, J., McClure, S., & Yu, T. (2018). Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations. Scientometrics, 114(3), 1141–1157.
Hu, K., Luo, Q., Qi, K., Yang, S., Mao, J., Fu, X., Zheng, J., Wu, H., Guo, Y., & Zhu, Q. (2019). Understanding the topic evolution of scientific literatures like an evolving city: Using Google Word2Vec model and spatial autocorrelation analysis. Information Processing & Management, 56(4), 1185–1203.
Huang, M., Zolnoori, M., Balls-Berry, J. E., Brockman, T. A., Patten, C. A., & Yao, L. (2019). Technological innovations in disease management: Text mining US patent data from 1995 to 2017. Journal of Medical Internet Research, 21(4), e13316.
Jebari, C., Herrera-Viedma, E., & Cobo, M. J. (2021). The use of citation context to detect the evolution of research topics: A large-scale analysis. Scientometrics, 126(4), 2971–2989.
Jeong, D.-H., & Song, M. (2014). Time gap analysis by the topic model-based temporal technique. Journal of Informetrics, 8(3), 776–790.
Kim, H. K., Kim, H., & Cho, S. (2017). Bag-of-concepts: Comprehending document representation through clustering words in distributed representation. Neurocomputing, 266, 336–352.
Li, D., Ding, Y., Shuai, X., Bollen, J., Tang, J., Chen, S., Zhu, J., & Rocha, G. (2012). Adding community and dynamic to topic models. Journal of Informetrics, 6(2), 237–253.
Li, P., Yang, G., & Wang, C. (2019). Visual topical analysis of library and information science. Scientometrics, 121(3), 1753–1791.
Li, D., B. He, Y. Ding, J. Tang, C. Sugimoto, Z. Qin, E. Yan, J. Li and T. Dong (2010). Community-based topic modeling for social tagging. Proceedings of the 19th ACM international conference on Information and knowledge management.
Lilleberg, J., Y. Zhu and Y. Q. Zhang (2015). Support Vector Machines and Word2vec for Text Classification with Semantic Features. 14th IEEE International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC), Beijing, PEOPLES R CHINA, Ieee.
Liu, X., Jiang, T., & Ma, F. (2013). Collective dynamics in knowledge networks: Emerging trends analysis. Journal of Informetrics, 7(2), 425–438.
Liu, Y., Tang, A. H., Sun, Z. B., Tang, W. Z., Cai, F., & Wang, C. J. (2020). An integrated retrieval framework for similar questions: Word-semantic embedded label clustering - LDA with question life cycle. Information Sciences, 537, 227–245.
Ma, J. and B. Lund (2021). "The evolution and shift of research topics and methods in library and information science." Journal of the Association for Information Science and Technology.
Ma, X., Lei, X. J., Zhao, G. S., & Qian, X. M. (2018). Rating prediction by exploring user’s preference and sentiment. Multimedia Tools and Applications, 77(6), 6425–6444.
Mikolov, T., K. Chen, G. Corrado and J. Dean (2013). "Efficient estimation of word representations in vector space." arXiv preprint.
Niu, L. Q., X. Y. Dai, J. B. Zhang and J. J. Chen (2015). Topic2Vec: Learning Distributed Representations of Topics. Proceedings of International Conference on Asian Language Processing, Suzhou, PEOPLES R CHINA, Ieee.
Qian, Y., Liu, Y., & Sheng, Q. Z. (2020). Understanding hierarchical structural evolution in a scientific discipline: A case study of artificial intelligence. Journal of Informetrics, 14(3), 101047.
Rosen-Zvi, M., T. Griffiths, M. Steyvers and P. Smyth (2012). "The author-topic model for authors and documents." arXiv preprint.
Soliman, A., K. Eissa and S. R. El-Beltagy (2017). AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. 3rd Arabic Computational Linguistics Conference (ACLing), British Univ Dubai, Dubai, U ARAB EMIRATES, Elsevier Science Bv.
Song, M., Heo, G. E., & Kim, S. Y. (2014). Analyzing topic evolution in bioinformatics: Investigation of dynamics of the field with conference data in DBLP. Scientometrics, 101(1), 397–428.
Truica, C. O., Apostol, E. S., Serban, M. L., & Paschke, A. (2021). Topic-Based Document-Level Sentiment Analysis Using Contextual Cues. Mathematics, 9(21), 23.
Wang, Z.-Y., Li, G., Li, C.-Y., & Li, A. (2012). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855–875.
Wang, X., C. Zhai and D. Roth (2013). Understanding evolution of research themes: a probabilistic generative model for citations. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.
Wang, Z. B., L. Ma, Y. Q. Zhang and Ieee (2016). A Hybrid Document Feature Extraction Method Using Latent Dirichlet Allocation and Word2Vec. 1st IEEE International Conference on Data Science in Cyberspace (DSC), Changsha, PEOPLES R CHINA, Ieee.
Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611–620.
Yang, M., Qu, Q., Chen, X., Tu, W., Shen, Y., & Zhu, J. (2019). Discovering author interest evolution in order-sensitive and Semantic-aware topic modeling. Information Sciences, 486, 271–286.
Zhang, J., & Yu, W. (2020). Early detection of technology opportunity based on analogy design and phrase semantic representation. Scientometrics, 125(1), 551–576.
Zhang, Y., Zhang, G., Zhu, D., & Lu, J. (2017). Scientific evolutionary pathways: Identifying and visualizing relationships for scientific topics. Journal of the Association for Information Science and Technology, 68(8), 1925–1939.
Zhijun, L., & Jinfen, X. (2019). The evolution of research article titles: The case of Journal of Pragmatics 1978–2018. Scientometrics, 121(3), 1619–1634.
Zhou, W. T., Wang, H. B., Sun, H. G., & Sun, T. L. (2019). A Method of Short Text Representation Based on the Feature Probability Embedded Vector. Sensors, 19(17), 23.
Acknowledgements
This research is financially supported by National Natural Science Foundation of China (No. 71603195, 71921002).
Author information
Authors and Affiliations
Contributions
QG: Conceived and design the analysis, Collected the data, Performed the analysis, Wrote the paper. XH: Conceived and design the analysis, Contributed data or analysis tools, Wrote the paper. KD: Conceived and design the analysis, Wrote the paper. ZL: Contributed data or analysis tools. JW: Conceived and design the analysis.
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, Q., Huang, X., Dong, K. et al. Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics 127, 1543–1563 (2022). https://doi.org/10.1007/s11192-022-04275-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04275-z