Abstract
Identifying the evolution trend of advanced technology-related topics has become an essential strategic issue affecting the industrial development of all countries in the world. In this paper, based on multiple data sources, we proposed a research framework that integrates the topic model and social network perspective to analyze the topic evolution of a specific technology field. First, we introduced the best-performing BERT pre-trained model in the given field and the Bayesian Optimization method to improve the Combined Topic Model, which achieved the best result in promoting topic coherence so far. Then we used the Optimized Combined Topic Model (OCTM) to complete topic recognition. Second, we constructed the co-occurrence network among topics in the same time window with the topics as the nodes and calculated the co-occurrence coefficient of all topic pairs. Afterward, we combined the co-occurrence coefficient between topics in the same time window and the similarity between topics in the adjacent time window to determine the topic evolution type and identify the path. Third, we utilized the characteristics of the nodes in the network, such as harmonic closeness centrality and weighting degree, completed the weighting by the Criteria Importance Though Intercriteria Correlation (CRITIC) method, and defined the importance index of each node in the undirected weighted network. Finally, according to the importance of nodes, the critical topic evolution paths were selected for specific analysis. We chose CRISPR technology as the empirical research field to preliminarily verify the operability and rationality of the method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. arXiv Preprint, arXiv:2004.03974 (2020)
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. Health. 3, 1–23 (2021)
Hui, L., Jixia, H., Zhiying, T.: Subject topic mining and evolution analysis with multi-source data. Data Anal. Knowl. Discov. 6, 44–55 (2022)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (2013)
Wang, Z., Ma, L., Zhang, Y.: A hybrid document feature extraction method using latent Dirichlet allocation and Word2Vec. In: 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), pp. 98–103. IEEE (2016)
Kim, S., Park, H., Lee, J.: Word2Vec-based latent semantic analysis (W2V-LSA) for topic modeling: a study on blockchain technology trend analysis. Expert Syst. Appl. 152, 113401 (2020). https://doi.org/10.1016/j.eswa.2020.113401
Hofmann, T.: Probabilistic latent semantic analysis. arXiv Preprint, arXiv:1301.6705 (2013)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv Preprint, arXiv:1810.04805 (2018)
Cheng, Q., et al.: Bert-based latent semantic analysis (Bert-LSA): a case study on geospatial data technology and application trend analysis. Appl. Sci. 11, 11897 (2021). https://doi.org/10.3390/app112411897
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models (2017). http://arxiv.org/abs/1703.01488
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems 25 (2012)
Liu, J., Long, Z., Wang, F.: Finding collaboration opportunities from emerging issues with LDA topic model and link prediction. Data Anal. Knowl. Discov. 3, 104–117 (2019)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer (2010)
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325
Palla, G., Barabási, A.-L., Vicsek, T.: Quantifying social group evolution. Nature 446, 664–667 (2007)
Diakoulaki, D., Mavrotas, G., Papayannakis, L.: Determining objective weights in multiple criteria problems: the critic method. Comput. Oper. Res. 22, 763–770 (1995)
Zhu, G., Pan, G., Li, F.: The topic evolution of information privacy from the perspective of temporal correlation and structural representation. Inf. Sci. 40, 127–137 (2022). https://doi.org/10.13833/j.issn.1007-7634.2022.04.016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Xu, S., Yang, Y., Huang, Y. (2023). Topic Evolution Analysis Based on Optimized Combined Topic Model: Illustrated as CRISPR Technology. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13972. Springer, Cham. https://doi.org/10.1007/978-3-031-28032-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-28032-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28031-3
Online ISBN: 978-3-031-28032-0
eBook Packages: Computer ScienceComputer Science (R0)