Skip to main content
Log in

Dynamics of topic formation and quantitative analysis of hot trends in physical science

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Successful research in the face of increasing complexity of modern scientific knowledge together with diversity and depth of the studied problems requires an understanding of the structure and evolution of trends in science. Available digital records open wide possibilities for statistical analysis of scientific publications and related metadata for topic modeling and evolution, knowledge mapping, citation indexing, etc. We investigate dynamical properties of the physical topics using analysis of temporal evolution of proximity measure for word pairs related to the mutual information. We use full-text conceptualization of content of scientific documents provided by the ScienceWISE platform for topic mapping, trend analysis and detection of hot topics together with relevant papers retrieval. We found that time evolution of relative mutual information distance reveals a hidden topic structure and could be used for quantitative analysis of current trends in scientific research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://sciencewise.info/.

  2. https://arxiv.org/.

References

  • Abdalgader, K., & Skabar, A. (2012). Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance. ACM Transactions on Speech and Language Processing. https://doi.org/10.1145/2168748.2168750.

  • Abe, K., Amey, J., Andreopoulos, C., Antonova, M., Aoki, S., Ariga, A., et al. (2017). Measurement of neutrino and antineutrino oscillations by the t2k experiment including a new additional sample of \(\nu\) e interactions at the far detector. Physical Review D, 96(9), 092006.

    Google Scholar 

  • Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms (pp. 77–128). Boston, MA: Springer.

    Google Scholar 

  • Alvarez-Ruso, L., Athar, M. S., Barbaro, M., Cherdack, D., Christy, M., Coloma, P., et al. (2018). Nustec white paper: Status and challenges of neutrino-nucleus scattering. Progress in Particle and Nuclear Physics, 100, 1–68.

    Google Scholar 

  • Amat, C. (2008). Editorial and publication delay of papers submitted to 14 selected food research journals influence of online posting. Scientometrics, 74(3), 379–389.

    Google Scholar 

  • Amelio, A., & Pizzuti, C. (2015). Is normalized mutual information a fair measure for comparing community detection methods? In Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, Association for Computing Machinery, New York, NY, USA, ASONAM ’15, pp. 1584–1585. https://doi.org/10.1145/2808797.2809344.

  • Anderberg, M. R. (1973). The broad view of cluster analysis. In Cluster analysis for applications, pp. 1–9.

  • Antusch, S., Cazzato, E., Drewes, M., Fischer, O., Garbrecht, B., Gueter, D., et al. (2018). Probing leptogenesis at future colliders. Journal of High Energy Physics, 9, 124.

    Google Scholar 

  • Astafiev, A., Prokofyev, R., Guéret, C., Boyarsky, A., & Ruchayskiy, O. (2012). Sciencewise: A web-based interactive semantic platform for paper annotation and ontology editing. In E. Simperl, B. Norton, D. Mladenic, E. D. Valle, I. Fundulaki, A. Passant, & R. Troncy (Eds.), The semantic web: ESWC 2012 satellite events—ESWC 2012 satellite events, Heraklion, Crete, Greece, May 27–31, 2012. Revised Selected Papers, Springer, Lecture Notes in Computer Science, Vol .7540, pp. 392–396. https://doi.org/10.1007/978-3-662-46641-4_33.

  • Banks, M. G. (2006). An extension of the hirsch index: Indexing scientific topics and compounds. Scientometrics, 69(1), 161–168.

    Google Scholar 

  • Barranco, R. C., Santos, R. F. D., & Hossain, M. S. (2018). Tracking the evolution of words with time-reflective text representations. In 2018 IEEE international conference on big data (big data), pp. 2088–2097.

  • Berlind, A. A., Frieman, J., Weinberg, D. H., Blanton, M. R., Warren, M. S., Abazajian, K., et al. (2006). Percolation galaxy groups and clusters in the sdss redshift survey: Identification, catalogs, and the multiplicity function. The Astrophysical Journal Supplement Series, 167(1), 1.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(null), 993–1022.

    MATH  Google Scholar 

  • Boyarsky, A., Ruchayskiy, O., Yang, Z., Zozulya, O., Marat, Charlaganov, & Rios, P. D. L. (2012). From scientific papers to the scientific ontology: Dynamical clustering of heterogeneous graphs and ontology crowdsourcing.

  • Bybee, J. L. (2007). Diachronic linguistics. In The Oxford handbook of cognitive linguistics.

  • Cai, D., He, X., & Han, J. (2011). Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902–913.

    Google Scholar 

  • Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205.

    Google Scholar 

  • Chen, C. (2006). Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology, 57(3), 359–377.

    Google Scholar 

  • Chen, H., Jiang, W., Yang, Y., Man, X., & Tang, M. (2015). A bibliometric analysis of waste management research during the period 1997–2014. Scientometrics, 105(2), 1005–1018.

    Google Scholar 

  • Collaboration, I., et al. (2017). Physics potential of the ical detector at the india-based neutrino observatory (ino). Pramana, 88, 79.

    Google Scholar 

  • Degaetano-Ortlieb, S., Kermes, H., Lapshinova-Koltunski, E., & Teich, E. (2013). Scitex—A diachronic corpus for analyzing the development of scientific registers. New Methods in Historical Corpus Linguistics Corpus Linguistics and Interdisciplinary Perspectives on Language-CLIP, 3, 93–104.

    Google Scholar 

  • Ding, Y. (2011). Applying weighted pagerank to author citation networks. Journal of the American Society for Information Science and Technology, 62(2), 236–245.

    Google Scholar 

  • Ding, W., & Chen, C. (2014). Dynamic topic detection and tracking: A comparison of hdp, c-word, and cocitation methods. Journal of the Association for Information Science and Technology, 65(10), 2084–2097.

    Google Scholar 

  • Dong, B., Xu, G., Luo, X., Cai, Y., & Gao, W. (2012). A bibliometric analysis of solar power research from 1991 to 2010. Scientometrics, 93(3), 1101–1117.

    Google Scholar 

  • Dridi, A., Gaber, M. M., Azad, R. M. A., & Bhogal, J. (2019). Leap2trend: A temporal word embedding approach for instant detection of emerging scientific trends. IEEE Access, 7, 176414–176428.

    Google Scholar 

  • Gan, C., & Wang, W. (2015). Research characteristics and status on social media in china: A bibliometric and co-word analysis. Scientometrics, 105(2), 1167–1182.

    Google Scholar 

  • Giganti, C., Lavignac, S., & Zito, M. (2018). Neutrino oscillations: the rise of the pmns paradigm. Progress in Particle and Nuclear Physics, 98, 1–54.

    Google Scholar 

  • Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.

    Google Scholar 

  • Hagedorn, C., Mohapatra, R., Molinaro, E., Nishi, C., & Petcov, S. (2018). Cp violation in the lepton sector and implications for leptogenesis. International Journal of Modern Physics A, 33(05n06), 1842006.

    Google Scholar 

  • Havel, T., Kuntz, I., & Crippen, G. (1983). The theory and practice of distance geometry. Bulletin of Mathematical Biology, 45, 665–720.

    MathSciNet  MATH  Google Scholar 

  • Havemann, F., Gläser, J., & Heinz, M. (2017). Memetic search for overlapping topics based on a local evaluation of link communities. Scientometrics, 111(2), 1089–1118.

    Google Scholar 

  • He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, Vol. 48.

  • Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196. https://doi.org/10.1023/A:1007617005950.

    Article  MATH  Google Scholar 

  • Hric, D., Darst, R. K., & Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Physical Review E, 90(6), 062805.

    Google Scholar 

  • Huchra, J., & Geller, M. (1982). Groups of galaxies. I—Nearby groups. The Astrophysical Journal, 257, 423–437.

    Google Scholar 

  • Ki, F., Saito, K., Kimura, M., & Numao, M. (2005). Visualizing dynamics of the hot topics using sequence-based self-organizing maps. In R. Khosla, R. J. Howlett, & L. C. Jain (Eds.), Knowledge-based intelligent information and engineering systems (pp. 745–751). Berlin: Springer.

    Google Scholar 

  • Lee, L. (1997). Similarity-based approaches to natural language processing. arXiv preprint cmp-lg/9708011.

  • Liu, G. Y., Hu, J. M., & Wang, H. L. (2012). A co-word analysis of digital library field in china. Scientometrics, 91(1), 203–217.

    Google Scholar 

  • Li, H., Xia, Q., & Wang, Y. (2017). Research and improvement of kruskal algorithm. Journal of Computer and Communications, 05, 63–69.

    Google Scholar 

  • Lu, Y., Mei, Q., & Zhai, C. (2011). Investigating task performance of probabilistic topic models: An empirical study of plsa and lda. Information Retrieval, 14, 178–203. https://doi.org/10.1007/s10791-010-9141-9.

    Article  Google Scholar 

  • Mane, K. K., & Borner, K. (2004). Mapping topics and topic bursts in pnas. Proceedings of the National Academy of Sciences, 101(suppl 1), 5287–5290.

    Google Scholar 

  • Mao, N., Wang, M. H., & Ho, Y. S. (2010). A bibliometric study of the trend in articles related to risk assessment published in science citation index. Human and Ecological Risk Assessment, 16(4), 801–824.

    Google Scholar 

  • McDaid, A.F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv:1110.2515.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Q. Weinberger (Eds.), Advances in neural information processing systems, Vol. 26, Curran Associates, Inc., pp. 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

  • Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems, Vol. 14.

  • Padilla, S., Methven, T. S., Corne, D. W., & Chantler, M. J. (2014). Hot topics in chi: trend maps for visualising research. In CHI’14 extended abstracts on human factors in computing systems, pp. 815–824.

  • Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.

    Google Scholar 

  • Peng, T. Q., & Zhu, J. J. (2012). Where you publish matters most: A multilevel analysis of factors affecting citations of internet studies. Journal of the American Society for Information Science and Technology, 63(9), 1789–1803.

    Google Scholar 

  • Press, W., & Davis, M. (1982). How to identify and weigh virialized clusters of galaxies in a complete redshift catalog. The Astrophysical Journal, 259, 449–473.

    Google Scholar 

  • Prokofyev, R., Demartini, G., Boyarsky, A., Ruchayskiy, O., & Cudré-Mauroux, P. (2013a). Ontology-based word sense disambiguation for scientific literature. In P. Serdyukov, P. Braslavski, S. O. Kuznetsov, J. Kamps, S. M. Rüger, E. Agichtein, I. Segalovich, & E. Yilmaz (Eds.), Advances in information retrieval—35th European conference on IR research, ECIR 2013, Moscow, Russia, March 24–27, 2013, Lecture Notes in Computer Science, Vol. 7814, Springer, pp. 594–605. https://doi.org/10.1007/978-3-642-36973-5_50.

  • Prokofyev, R., Demartini, G., Boyarsky, A., Ruchayskiy, O., & Cudre-Mauroux, P. (2013b). Ontology-based word sense disambiguation for scientific literature. In P. Serdyukov, P. Braslavski, S. O. Kuznetsov, J. Kamps, S. Ruger, E. Agichtein, I. Segalovich, & E. Yilmaz (Eds.), Advances in information retrieval (pp. 594–605). Berlin: Springer.

    Google Scholar 

  • Proto Collaboration, H. K., Abe, K., Abe, K., Ahn, S., Aihara, H., Aimi, A., et al. (2018). Physics potentials with the second hyper-kamiokande detector in korea. Progress of Theoretical and Experimental Physics, 2018(6), 063C01.

  • Qian, X., & Peng, J. C. (2019). Physics with reactor neutrinos. Reports on Progress in Physics, 82(3), 036201.

    Google Scholar 

  • Renals, S. (2007). Formal modeling in cognitive science.

  • Rudolph, M., & Blei, D. (2018). Dynamic embeddings for language evolution. In Proceedings of the 2018 world wide web conference, international world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’18, pp. 1003–1011. https://doi.org/10.1145/3178876.3185999.

  • Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.

    MathSciNet  MATH  Google Scholar 

  • Steuer, R., Daub, C. O., Selbig, J., & Kurths, J. (2005a). Measuring distances between variables by mutual information. In Innovations in classification, data science, and information systems, Springer, pp. 81–90.

  • Steuer, R., Daub, C. O., Selbig, J., & Kurths, J. (2005b). Measuring distances between variables by mutual information. In D. Baier & K. D. Wernecke (Eds.), Innovations in classification, data science, and information systems (pp. 81–90). Berlin: Springer.

    MATH  Google Scholar 

  • Su, X., Deng, S., & Shen, S. (2014). The design and application value of the chinese social science citation index. Scientometrics, 98(3), 1567–1582.

    Google Scholar 

  • Tan, J., Fu, H. Z., & Ho, Y. S. (2014). A bibliometric analysis of research on proteomics in science citation index expanded. Scientometrics, 98(2), 1473–1490.

    Google Scholar 

  • Velden, T., Boyack, K., Glaser, J., Koopman, R., Scharnhorst, A., & Wang, S. (2017). Comparison of topic extraction approaches and their results. Scientometrics, 111, 1169–1221. https://doi.org/10.1007/s11192-017-2306-1.

    Article  Google Scholar 

  • Wang, X., & Fang, Z. (2016). Detecting and tracking the real-time hot topics: A study on computational neuroscience. arXiv:1608.05517.

  • Wen, H., & Huang, Y. (2012). Trends and performance of oxidative stress research from 1991 to 2010. Scientometrics, 91(1), 51–63.

    Google Scholar 

  • Xie, P. (2015). Study of international anticancer research trends via co-word and document co-citation visualization analysis. Scientometrics, 105(1), 611–622.

    Google Scholar 

  • Xie, P., & Xing, E. P. (2013). Integrating document clustering and topic modeling. In Proceedings of the twenty-ninth conference on uncertainty in artificial intelligence, AUAI Press, Arlington, Virginia, USA, UAI’13, pp. 694–703.

  • Xu, W., & Gong, Y. (2004). Document clustering by concept factorization. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR ’04, pp. 202–209. https://doi.org/10.1145/1008992.1009029.

  • Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR ’03, pp. 267–273. https://doi.org/10.1145/860435.860485.

  • Yan, E., & Ding, Y. (2012). Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other. Journal of the American Society for Information Science and Technology, 63(7), 1313–1326.

    Google Scholar 

  • Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2017). Dynamic word embeddings for evolving semantic discovery. arXiv e-prints arXiv:1703.00607.

  • Ye, C., Liu, D., Chen, N., & Lin, L. (2015). Mapping the topic evolution using citation-topic model and social network analysis. In 2015 12th International conference on fuzzy systems and knowledge discovery (FSKD), pp. 2648–2653.

  • Ye, F. Y. (2013). Measuring hot topics in sciences. Current Science, 104(2), 160.

    Google Scholar 

  • Zheng, L. (2019). Using mutual information as a cocitation similarity measure. Scientometrics, 119(3), 1695–1713.

    Google Scholar 

  • Zheng, T., Wang, J., Wang, Q., Nie, C., Smale, N., Shi, Z., et al. (2015). A bibliometric analysis of industrial wastewater research: Current trends and future prospects. Scientometrics, 105(2), 863–882.

    Google Scholar 

  • Zhong, Q. Y., & Song, J. (2008). The developing trend research of knowledge management overseas based on word frequency analysis. In 2008 4th International conference on wireless communications, networking and mobile computing, IEEE, pp. 1–4.

Download references

Acknowledgements

The authors are grateful to Stanislav Vilchynsky, Oleg Ruchaisky and Alexey Boiarskyi for their helpful discussion and suggestions and Andrey Magalich for his help with the data preparation. This work was supported by the Swiss National Science Foundation Grant “Complex Information Network Manipulation” (SCOPES Grant IZ74Z0-160497/1).

Author information

Authors and Affiliations

Authors

Contributions

Authors must disclose all relationships or interests that could have direct or potential influence or impart bias on the work: AVC and AIY conceived the study. AVC designed the study and carried out the data analyses. AVC, AIY, BGK and ILM participated in the interpretation of data. All authors read and approved the final manuscript.

Corresponding author

Correspondence to A. V. Chumachenko.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chumachenko, A.V., Kreminskyi, B.G., Mosenkis, I.L. et al. Dynamics of topic formation and quantitative analysis of hot trends in physical science. Scientometrics 125, 739–753 (2020). https://doi.org/10.1007/s11192-020-03610-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03610-6

Keywords

Mathematics Subject Classification

Navigation