Skip to main content
Log in

Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The combination of the topic model and the semantic method can help to discover the semantic distributions of topics and the changing characteristics of the semantic distributions, further providing a new perspective for the research of topic evolution. This study proposes a solution for quantifying the semantic distributions and the changing characteristics based on words in topic evolution through the Dynamic topic model (DTM) and the word2vec model. A dataset in the field of Library and information science (LIS) is utilized in the empirical study, and the topic-semantic probability distribution is derived. The evolving dynamics of the topics are constructed. The characteristics of evolving dynamics are used to explain the semantic distributions of topics in topic evolution. Then, the regularities of evolving dynamics are summarized to explain the changing characteristics of semantic distributions in topic evolution. Results show that no topic is distributed in a single semantic concept, and most topics correspond to various semantic concepts in LIS. The three kinds of topics in LIS are the convergent, diffusive, and stable topics. The discovery of different modes of topic evolution can further prove the development of the field. In addition, findings indicate that the popularity of topics and the characteristics of evolving dynamics of topics are irrelevant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Almeida, M., Souza, R., & Fonseca, F. (2011). Semantics in the Semantic Web: A Critical Evaluation. Knowledge Organization, 38(3), 187–203.

    Article  Google Scholar 

  • Asghari, M., D. Sierra-Sosa, A. Elmaghraby and Ieee (2018). Trends on Health in Social Media: Analysis using Twitter Topic Modeling. IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, Ieee.

  • Blei, D. M. and J. D. Lafferty (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning.

  • Blei, D. M., A. Y. Ng and M. I. Jordan (2003). "Latent dirichlet allocation." the Journal of machine Learning research 3(4–5): 993–1022.

  • Chang, Y.-W., Huang, M.-H., & Lin, C.-W. (2015). Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses. Scientometrics, 105(3), 2071–2087.

    Article  Google Scholar 

  • Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017a). Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.

    Article  Google Scholar 

  • Chen, Q., Ai, N., Liao, J., Shao, X., Liu, Y., & Fan, X. (2017b). Revealing topics and their evolution in biomedical literature using Bio-DTM: A case study of ginseng. Chinese Medicine, 12(1), 1–9.

    Article  Google Scholar 

  • Chen, B., Ding, Y., & Ma, F. (2018). Semantic word shifts in a scientific domain. Scientometrics, 117(1), 211–226.

    Article  Google Scholar 

  • Cheng, Q., Wang, J., Lu, W., Huang, Y., & Bu, Y. (2020). Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis. Scientometrics, 124(3), 1923–1943.

    Article  Google Scholar 

  • Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146–166.

    Article  Google Scholar 

  • Ding, W., & Chen, C. (2014). Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods. Journal of the Association for Information Science and Technology, 65(10), 2084–2097.

    Article  Google Scholar 

  • Fu, R. J., J. Guo, B. Qin, W. X. Che, H. F. Wang and T. Liu (2014). Learning Semantic Hierarchies via Word Embeddings. 52nd Annual Meeting of the Association-for-Computational-Linguistics (ACL), Baltimore, MD, Assoc Computational Linguistics-Acl.

  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.

    Article  Google Scholar 

  • Hamilton, W. L., J. Leskovec and D. Jurafsky (2016). Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Berlin, GERMANY, Assoc Computational Linguistics-Acl.

  • Hu, K., Qi, K., Yang, S., Shen, S., Cheng, X., Wu, H., Zheng, J., McClure, S., & Yu, T. (2018). Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations. Scientometrics, 114(3), 1141–1157.

    Article  Google Scholar 

  • Hu, K., Luo, Q., Qi, K., Yang, S., Mao, J., Fu, X., Zheng, J., Wu, H., Guo, Y., & Zhu, Q. (2019). Understanding the topic evolution of scientific literatures like an evolving city: Using Google Word2Vec model and spatial autocorrelation analysis. Information Processing & Management, 56(4), 1185–1203.

    Article  Google Scholar 

  • Huang, M., Zolnoori, M., Balls-Berry, J. E., Brockman, T. A., Patten, C. A., & Yao, L. (2019). Technological innovations in disease management: Text mining US patent data from 1995 to 2017. Journal of Medical Internet Research, 21(4), e13316.

    Article  Google Scholar 

  • Jebari, C., Herrera-Viedma, E., & Cobo, M. J. (2021). The use of citation context to detect the evolution of research topics: A large-scale analysis. Scientometrics, 126(4), 2971–2989.

    Article  Google Scholar 

  • Jeong, D.-H., & Song, M. (2014). Time gap analysis by the topic model-based temporal technique. Journal of Informetrics, 8(3), 776–790.

    Article  MathSciNet  Google Scholar 

  • Kim, H. K., Kim, H., & Cho, S. (2017). Bag-of-concepts: Comprehending document representation through clustering words in distributed representation. Neurocomputing, 266, 336–352.

    Article  Google Scholar 

  • Li, D., Ding, Y., Shuai, X., Bollen, J., Tang, J., Chen, S., Zhu, J., & Rocha, G. (2012). Adding community and dynamic to topic models. Journal of Informetrics, 6(2), 237–253.

    Article  Google Scholar 

  • Li, P., Yang, G., & Wang, C. (2019). Visual topical analysis of library and information science. Scientometrics, 121(3), 1753–1791.

    Article  Google Scholar 

  • Li, D., B. He, Y. Ding, J. Tang, C. Sugimoto, Z. Qin, E. Yan, J. Li and T. Dong (2010). Community-based topic modeling for social tagging. Proceedings of the 19th ACM international conference on Information and knowledge management.

  • Lilleberg, J., Y. Zhu and Y. Q. Zhang (2015). Support Vector Machines and Word2vec for Text Classification with Semantic Features. 14th IEEE International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC), Beijing, PEOPLES R CHINA, Ieee.

  • Liu, X., Jiang, T., & Ma, F. (2013). Collective dynamics in knowledge networks: Emerging trends analysis. Journal of Informetrics, 7(2), 425–438.

    Article  Google Scholar 

  • Liu, Y., Tang, A. H., Sun, Z. B., Tang, W. Z., Cai, F., & Wang, C. J. (2020). An integrated retrieval framework for similar questions: Word-semantic embedded label clustering - LDA with question life cycle. Information Sciences, 537, 227–245.

    Article  MathSciNet  Google Scholar 

  • Ma, J. and B. Lund (2021). "The evolution and shift of research topics and methods in library and information science." Journal of the Association for Information Science and Technology.

  • Ma, X., Lei, X. J., Zhao, G. S., & Qian, X. M. (2018). Rating prediction by exploring user’s preference and sentiment. Multimedia Tools and Applications, 77(6), 6425–6444.

    Article  Google Scholar 

  • Mikolov, T., K. Chen, G. Corrado and J. Dean (2013). "Efficient estimation of word representations in vector space." arXiv preprint.

  • Niu, L. Q., X. Y. Dai, J. B. Zhang and J. J. Chen (2015). Topic2Vec: Learning Distributed Representations of Topics. Proceedings of International Conference on Asian Language Processing, Suzhou, PEOPLES R CHINA, Ieee.

  • Qian, Y., Liu, Y., & Sheng, Q. Z. (2020). Understanding hierarchical structural evolution in a scientific discipline: A case study of artificial intelligence. Journal of Informetrics, 14(3), 101047.

    Article  Google Scholar 

  • Rosen-Zvi, M., T. Griffiths, M. Steyvers and P. Smyth (2012). "The author-topic model for authors and documents." arXiv preprint.

  • Soliman, A., K. Eissa and S. R. El-Beltagy (2017). AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. 3rd Arabic Computational Linguistics Conference (ACLing), British Univ Dubai, Dubai, U ARAB EMIRATES, Elsevier Science Bv.

  • Song, M., Heo, G. E., & Kim, S. Y. (2014). Analyzing topic evolution in bioinformatics: Investigation of dynamics of the field with conference data in DBLP. Scientometrics, 101(1), 397–428.

    Article  Google Scholar 

  • Truica, C. O., Apostol, E. S., Serban, M. L., & Paschke, A. (2021). Topic-Based Document-Level Sentiment Analysis Using Contextual Cues. Mathematics, 9(21), 23.

    Article  Google Scholar 

  • Wang, Z.-Y., Li, G., Li, C.-Y., & Li, A. (2012). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855–875.

    Article  Google Scholar 

  • Wang, X., C. Zhai and D. Roth (2013). Understanding evolution of research themes: a probabilistic generative model for citations. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.

  • Wang, Z. B., L. Ma, Y. Q. Zhang and Ieee (2016). A Hybrid Document Feature Extraction Method Using Latent Dirichlet Allocation and Word2Vec. 1st IEEE International Conference on Data Science in Cyberspace (DSC), Changsha, PEOPLES R CHINA, Ieee.

  • Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611–620.

    Article  Google Scholar 

  • Yang, M., Qu, Q., Chen, X., Tu, W., Shen, Y., & Zhu, J. (2019). Discovering author interest evolution in order-sensitive and Semantic-aware topic modeling. Information Sciences, 486, 271–286.

    Article  Google Scholar 

  • Zhang, J., & Yu, W. (2020). Early detection of technology opportunity based on analogy design and phrase semantic representation. Scientometrics, 125(1), 551–576.

    Article  Google Scholar 

  • Zhang, Y., Zhang, G., Zhu, D., & Lu, J. (2017). Scientific evolutionary pathways: Identifying and visualizing relationships for scientific topics. Journal of the Association for Information Science and Technology, 68(8), 1925–1939.

    Article  Google Scholar 

  • Zhijun, L., & Jinfen, X. (2019). The evolution of research article titles: The case of Journal of Pragmatics 1978–2018. Scientometrics, 121(3), 1619–1634.

    Article  Google Scholar 

  • Zhou, W. T., Wang, H. B., Sun, H. G., & Sun, T. L. (2019). A Method of Short Text Representation Based on the Feature Probability Embedded Vector. Sensors, 19(17), 23.

    Google Scholar 

Download references

Acknowledgements

This research is financially supported by National Natural Science Foundation of China (No. 71603195, 71921002).

Author information

Authors and Affiliations

Authors

Contributions

QG: Conceived and design the analysis, Collected the data, Performed the analysis, Wrote the paper. XH: Conceived and design the analysis, Contributed data or analysis tools, Wrote the paper. KD: Conceived and design the analysis, Wrote the paper. ZL: Contributed data or analysis tools. JW: Conceived and design the analysis.

Corresponding author

Correspondence to Xiao Huang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Q., Huang, X., Dong, K. et al. Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec. Scientometrics 127, 1543–1563 (2022). https://doi.org/10.1007/s11192-022-04275-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-022-04275-z

Keywords