Abstract
Successful research in the face of increasing complexity of modern scientific knowledge together with diversity and depth of the studied problems requires an understanding of the structure and evolution of trends in science. Available digital records open wide possibilities for statistical analysis of scientific publications and related metadata for topic modeling and evolution, knowledge mapping, citation indexing, etc. We investigate dynamical properties of the physical topics using analysis of temporal evolution of proximity measure for word pairs related to the mutual information. We use full-text conceptualization of content of scientific documents provided by the ScienceWISE platform for topic mapping, trend analysis and detection of hot topics together with relevant papers retrieval. We found that time evolution of relative mutual information distance reveals a hidden topic structure and could be used for quantitative analysis of current trends in scientific research.
Similar content being viewed by others
References
Abdalgader, K., & Skabar, A. (2012). Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance. ACM Transactions on Speech and Language Processing. https://doi.org/10.1145/2168748.2168750.
Abe, K., Amey, J., Andreopoulos, C., Antonova, M., Aoki, S., Ariga, A., et al. (2017). Measurement of neutrino and antineutrino oscillations by the t2k experiment including a new additional sample of \(\nu\) e interactions at the far detector. Physical Review D, 96(9), 092006.
Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms (pp. 77–128). Boston, MA: Springer.
Alvarez-Ruso, L., Athar, M. S., Barbaro, M., Cherdack, D., Christy, M., Coloma, P., et al. (2018). Nustec white paper: Status and challenges of neutrino-nucleus scattering. Progress in Particle and Nuclear Physics, 100, 1–68.
Amat, C. (2008). Editorial and publication delay of papers submitted to 14 selected food research journals influence of online posting. Scientometrics, 74(3), 379–389.
Amelio, A., & Pizzuti, C. (2015). Is normalized mutual information a fair measure for comparing community detection methods? In Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, Association for Computing Machinery, New York, NY, USA, ASONAM ’15, pp. 1584–1585. https://doi.org/10.1145/2808797.2809344.
Anderberg, M. R. (1973). The broad view of cluster analysis. In Cluster analysis for applications, pp. 1–9.
Antusch, S., Cazzato, E., Drewes, M., Fischer, O., Garbrecht, B., Gueter, D., et al. (2018). Probing leptogenesis at future colliders. Journal of High Energy Physics, 9, 124.
Astafiev, A., Prokofyev, R., Guéret, C., Boyarsky, A., & Ruchayskiy, O. (2012). Sciencewise: A web-based interactive semantic platform for paper annotation and ontology editing. In E. Simperl, B. Norton, D. Mladenic, E. D. Valle, I. Fundulaki, A. Passant, & R. Troncy (Eds.), The semantic web: ESWC 2012 satellite events—ESWC 2012 satellite events, Heraklion, Crete, Greece, May 27–31, 2012. Revised Selected Papers, Springer, Lecture Notes in Computer Science, Vol .7540, pp. 392–396. https://doi.org/10.1007/978-3-662-46641-4_33.
Banks, M. G. (2006). An extension of the hirsch index: Indexing scientific topics and compounds. Scientometrics, 69(1), 161–168.
Barranco, R. C., Santos, R. F. D., & Hossain, M. S. (2018). Tracking the evolution of words with time-reflective text representations. In 2018 IEEE international conference on big data (big data), pp. 2088–2097.
Berlind, A. A., Frieman, J., Weinberg, D. H., Blanton, M. R., Warren, M. S., Abazajian, K., et al. (2006). Percolation galaxy groups and clusters in the sdss redshift survey: Identification, catalogs, and the multiplicity function. The Astrophysical Journal Supplement Series, 167(1), 1.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(null), 993–1022.
Boyarsky, A., Ruchayskiy, O., Yang, Z., Zozulya, O., Marat, Charlaganov, & Rios, P. D. L. (2012). From scientific papers to the scientific ontology: Dynamical clustering of heterogeneous graphs and ontology crowdsourcing.
Bybee, J. L. (2007). Diachronic linguistics. In The Oxford handbook of cognitive linguistics.
Cai, D., He, X., & Han, J. (2011). Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902–913.
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205.
Chen, C. (2006). Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology, 57(3), 359–377.
Chen, H., Jiang, W., Yang, Y., Man, X., & Tang, M. (2015). A bibliometric analysis of waste management research during the period 1997–2014. Scientometrics, 105(2), 1005–1018.
Collaboration, I., et al. (2017). Physics potential of the ical detector at the india-based neutrino observatory (ino). Pramana, 88, 79.
Degaetano-Ortlieb, S., Kermes, H., Lapshinova-Koltunski, E., & Teich, E. (2013). Scitex—A diachronic corpus for analyzing the development of scientific registers. New Methods in Historical Corpus Linguistics Corpus Linguistics and Interdisciplinary Perspectives on Language-CLIP, 3, 93–104.
Ding, Y. (2011). Applying weighted pagerank to author citation networks. Journal of the American Society for Information Science and Technology, 62(2), 236–245.
Ding, W., & Chen, C. (2014). Dynamic topic detection and tracking: A comparison of hdp, c-word, and cocitation methods. Journal of the Association for Information Science and Technology, 65(10), 2084–2097.
Dong, B., Xu, G., Luo, X., Cai, Y., & Gao, W. (2012). A bibliometric analysis of solar power research from 1991 to 2010. Scientometrics, 93(3), 1101–1117.
Dridi, A., Gaber, M. M., Azad, R. M. A., & Bhogal, J. (2019). Leap2trend: A temporal word embedding approach for instant detection of emerging scientific trends. IEEE Access, 7, 176414–176428.
Gan, C., & Wang, W. (2015). Research characteristics and status on social media in china: A bibliometric and co-word analysis. Scientometrics, 105(2), 1167–1182.
Giganti, C., Lavignac, S., & Zito, M. (2018). Neutrino oscillations: the rise of the pmns paradigm. Progress in Particle and Nuclear Physics, 98, 1–54.
Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.
Hagedorn, C., Mohapatra, R., Molinaro, E., Nishi, C., & Petcov, S. (2018). Cp violation in the lepton sector and implications for leptogenesis. International Journal of Modern Physics A, 33(05n06), 1842006.
Havel, T., Kuntz, I., & Crippen, G. (1983). The theory and practice of distance geometry. Bulletin of Mathematical Biology, 45, 665–720.
Havemann, F., Gläser, J., & Heinz, M. (2017). Memetic search for overlapping topics based on a local evaluation of link communities. Scientometrics, 111(2), 1089–1118.
He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, Vol. 48.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196. https://doi.org/10.1023/A:1007617005950.
Hric, D., Darst, R. K., & Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Physical Review E, 90(6), 062805.
Huchra, J., & Geller, M. (1982). Groups of galaxies. I—Nearby groups. The Astrophysical Journal, 257, 423–437.
Ki, F., Saito, K., Kimura, M., & Numao, M. (2005). Visualizing dynamics of the hot topics using sequence-based self-organizing maps. In R. Khosla, R. J. Howlett, & L. C. Jain (Eds.), Knowledge-based intelligent information and engineering systems (pp. 745–751). Berlin: Springer.
Lee, L. (1997). Similarity-based approaches to natural language processing. arXiv preprint cmp-lg/9708011.
Liu, G. Y., Hu, J. M., & Wang, H. L. (2012). A co-word analysis of digital library field in china. Scientometrics, 91(1), 203–217.
Li, H., Xia, Q., & Wang, Y. (2017). Research and improvement of kruskal algorithm. Journal of Computer and Communications, 05, 63–69.
Lu, Y., Mei, Q., & Zhai, C. (2011). Investigating task performance of probabilistic topic models: An empirical study of plsa and lda. Information Retrieval, 14, 178–203. https://doi.org/10.1007/s10791-010-9141-9.
Mane, K. K., & Borner, K. (2004). Mapping topics and topic bursts in pnas. Proceedings of the National Academy of Sciences, 101(suppl 1), 5287–5290.
Mao, N., Wang, M. H., & Ho, Y. S. (2010). A bibliometric study of the trend in articles related to risk assessment published in science citation index. Human and Ecological Risk Assessment, 16(4), 801–824.
McDaid, A.F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv:1110.2515.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Q. Weinberger (Eds.), Advances in neural information processing systems, Vol. 26, Curran Associates, Inc., pp. 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems, Vol. 14.
Padilla, S., Methven, T. S., Corne, D. W., & Chantler, M. J. (2014). Hot topics in chi: trend maps for visualising research. In CHI’14 extended abstracts on human factors in computing systems, pp. 815–824.
Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.
Peng, T. Q., & Zhu, J. J. (2012). Where you publish matters most: A multilevel analysis of factors affecting citations of internet studies. Journal of the American Society for Information Science and Technology, 63(9), 1789–1803.
Press, W., & Davis, M. (1982). How to identify and weigh virialized clusters of galaxies in a complete redshift catalog. The Astrophysical Journal, 259, 449–473.
Prokofyev, R., Demartini, G., Boyarsky, A., Ruchayskiy, O., & Cudré-Mauroux, P. (2013a). Ontology-based word sense disambiguation for scientific literature. In P. Serdyukov, P. Braslavski, S. O. Kuznetsov, J. Kamps, S. M. Rüger, E. Agichtein, I. Segalovich, & E. Yilmaz (Eds.), Advances in information retrieval—35th European conference on IR research, ECIR 2013, Moscow, Russia, March 24–27, 2013, Lecture Notes in Computer Science, Vol. 7814, Springer, pp. 594–605. https://doi.org/10.1007/978-3-642-36973-5_50.
Prokofyev, R., Demartini, G., Boyarsky, A., Ruchayskiy, O., & Cudre-Mauroux, P. (2013b). Ontology-based word sense disambiguation for scientific literature. In P. Serdyukov, P. Braslavski, S. O. Kuznetsov, J. Kamps, S. Ruger, E. Agichtein, I. Segalovich, & E. Yilmaz (Eds.), Advances in information retrieval (pp. 594–605). Berlin: Springer.
Proto Collaboration, H. K., Abe, K., Abe, K., Ahn, S., Aihara, H., Aimi, A., et al. (2018). Physics potentials with the second hyper-kamiokande detector in korea. Progress of Theoretical and Experimental Physics, 2018(6), 063C01.
Qian, X., & Peng, J. C. (2019). Physics with reactor neutrinos. Reports on Progress in Physics, 82(3), 036201.
Renals, S. (2007). Formal modeling in cognitive science.
Rudolph, M., & Blei, D. (2018). Dynamic embeddings for language evolution. In Proceedings of the 2018 world wide web conference, international world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’18, pp. 1003–1011. https://doi.org/10.1145/3178876.3185999.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.
Steuer, R., Daub, C. O., Selbig, J., & Kurths, J. (2005a). Measuring distances between variables by mutual information. In Innovations in classification, data science, and information systems, Springer, pp. 81–90.
Steuer, R., Daub, C. O., Selbig, J., & Kurths, J. (2005b). Measuring distances between variables by mutual information. In D. Baier & K. D. Wernecke (Eds.), Innovations in classification, data science, and information systems (pp. 81–90). Berlin: Springer.
Su, X., Deng, S., & Shen, S. (2014). The design and application value of the chinese social science citation index. Scientometrics, 98(3), 1567–1582.
Tan, J., Fu, H. Z., & Ho, Y. S. (2014). A bibliometric analysis of research on proteomics in science citation index expanded. Scientometrics, 98(2), 1473–1490.
Velden, T., Boyack, K., Glaser, J., Koopman, R., Scharnhorst, A., & Wang, S. (2017). Comparison of topic extraction approaches and their results. Scientometrics, 111, 1169–1221. https://doi.org/10.1007/s11192-017-2306-1.
Wang, X., & Fang, Z. (2016). Detecting and tracking the real-time hot topics: A study on computational neuroscience. arXiv:1608.05517.
Wen, H., & Huang, Y. (2012). Trends and performance of oxidative stress research from 1991 to 2010. Scientometrics, 91(1), 51–63.
Xie, P. (2015). Study of international anticancer research trends via co-word and document co-citation visualization analysis. Scientometrics, 105(1), 611–622.
Xie, P., & Xing, E. P. (2013). Integrating document clustering and topic modeling. In Proceedings of the twenty-ninth conference on uncertainty in artificial intelligence, AUAI Press, Arlington, Virginia, USA, UAI’13, pp. 694–703.
Xu, W., & Gong, Y. (2004). Document clustering by concept factorization. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR ’04, pp. 202–209. https://doi.org/10.1145/1008992.1009029.
Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR ’03, pp. 267–273. https://doi.org/10.1145/860435.860485.
Yan, E., & Ding, Y. (2012). Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other. Journal of the American Society for Information Science and Technology, 63(7), 1313–1326.
Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2017). Dynamic word embeddings for evolving semantic discovery. arXiv e-prints arXiv:1703.00607.
Ye, C., Liu, D., Chen, N., & Lin, L. (2015). Mapping the topic evolution using citation-topic model and social network analysis. In 2015 12th International conference on fuzzy systems and knowledge discovery (FSKD), pp. 2648–2653.
Ye, F. Y. (2013). Measuring hot topics in sciences. Current Science, 104(2), 160.
Zheng, L. (2019). Using mutual information as a cocitation similarity measure. Scientometrics, 119(3), 1695–1713.
Zheng, T., Wang, J., Wang, Q., Nie, C., Smale, N., Shi, Z., et al. (2015). A bibliometric analysis of industrial wastewater research: Current trends and future prospects. Scientometrics, 105(2), 863–882.
Zhong, Q. Y., & Song, J. (2008). The developing trend research of knowledge management overseas based on word frequency analysis. In 2008 4th International conference on wireless communications, networking and mobile computing, IEEE, pp. 1–4.
Acknowledgements
The authors are grateful to Stanislav Vilchynsky, Oleg Ruchaisky and Alexey Boiarskyi for their helpful discussion and suggestions and Andrey Magalich for his help with the data preparation. This work was supported by the Swiss National Science Foundation Grant “Complex Information Network Manipulation” (SCOPES Grant IZ74Z0-160497/1).
Author information
Authors and Affiliations
Contributions
Authors must disclose all relationships or interests that could have direct or potential influence or impart bias on the work: AVC and AIY conceived the study. AVC designed the study and carried out the data analyses. AVC, AIY, BGK and ILM participated in the interpretation of data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Chumachenko, A.V., Kreminskyi, B.G., Mosenkis, I.L. et al. Dynamics of topic formation and quantitative analysis of hot trends in physical science. Scientometrics 125, 739–753 (2020). https://doi.org/10.1007/s11192-020-03610-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03610-6