Abstract
The highly cited papers defined by Clarivate Analytics’ Essential Science Indicators (ESI) have been widely used to measure the scientific performance of scientists, research institutions, universities and countries. However, researchers have seldom studied which factors can affect a paper to be an ESI highly cited paper. The prediction of ESI highly cited papers is much less studied, too. According to the existing researches about factors influencing paper’s citations, four classical papers’ factors are chosen in this study, which are scientific impact of the first author, scientific impact of the potential leader, scientific impact of the team and the relevance of authors’ existing papers. Similar to the definition of ESI highly cited papers, we develop a new measure of papers’ scientific impact. Firstly, we get statistics properties of four factors with APS data and Nobel data in order to study four factors’ performance of ESI highly cited papers. Then, Spearman correlation and Logistic regression are applied to explore the relationship between four factors and papers’ scientific impact. At last, we try to predict highly cited papers by NN algorithms incorporating four factors. The results show that the potential leader factor plays a more important role in the short term than in the long term, while the team factor is on the contrary, more important in the long term. Interestingly, the first author factor doesn’t have an obvious effect on papers’ scientific impact among top 1%. The prediction results are better than random.
Similar content being viewed by others
References
Abramo, G., Cicero, T., & DAngelo, C. A. (2011). Assessing the varying level of impact measurement accuracy as a function of the citation window length. Journal of Informetrics, 5(4), 659–667.
Adams, J. (2005). Early citation counts correlate with accumulated impact. Scientometrics, 63(3), 567–581.
Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation, 12(3), 159–170.
Bianconi, G., & Barabási, A. L. (2001). Bose–Einstein condensation in complex networks. Physical Review Letters, 86(24), 5632.
Biscaro, C., & Giupponi, C. (2014). Co-authorship and bibliographic coupling network effects on citations. PloS ONE, 9(6), e99502.
Cao, X., Chen, Y., & Liu, K. R. (2016). A data analytic approach to quantifying scientific impact. Journal of Informetrics, 10(2), 471–484.
Chatterjee, A., Ghosh, A., & Chakrabarti, B. K. (2016). Universality of citation distributions for academic institutions and journals. PloS ONE, 11(1), e0146762.
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Googles PageRank algorithm. Journal of Informetrics, 1(1), 8–15.
Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the Association for Information Science and Technology, 62(1), 50–60.
Gardner, M. W., & Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmospheric Environment, 32(14), 2627–2636.
Garfield, E., & Welljamsdorof, A. (1992). Of nobel class—A citation perspective on high-impact research authors. Theoretical Medicine, 13(2), 117–135.
Glänzel, W., & Schubert, A. (1988). Characteristic scores and scales in assessing citation impact. Journal of Information Science, 14(2), 123–127.
Hirsch, J. E. (2007). Does the h index have predictive power? Proceedings of the National Academy of Sciences, 104(49), 19193–19198.
Hu, X., & Rousseau, R. (2009). A comparative study of the difference in research performance in biomedical fields among selected Western and Asian countries. Scientometrics, 81(2), 475–491.
Hurley, L. A., Ogier, A. L., & Torvik, V. I. (2013, November). Deconstructing the collaborative impact: Article and author characteristics that influence citation count. In Proceedings of the 76th ASIS&T annual meeting: Beyond the cloud: Rethinking information boundaries (p. 61). American Society for Information Science.
Jeong, H., Nda, Z., & Barabási, A. L. (2003). Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4), 567.
Kosmulski, M. (2012). The order in the lists of authors in multi-author papers revisited. Journal of Informetrics, 6(4), 639–644.
Mariani, M. S., Medo, M., & Zhang, Y. C. (2016). Identification of milestone papers through time-balanced network centrality. Journal of Informetrics, 10(4), 1207–1223.
Maske, K. L., Durden, G. C., & Gaynor, P. E. (2003). Determinants of scholarly productivity among male and female economists. Economic inquiry, 41(4), 555–564.
Merton, R. K. (1968). The Matthew effect in science: The reward and communication systems of science are considered. Science, 159(3810), 56–63.
Newman, M. E. (2009). The first-mover advantage in scientific publication. EPL (Europhysics Letters), 86(6), 68001.
Newman, M. E. J. (2014). Prediction of highly cited papers. EPL (Europhysics Letters), 105(2), 28002.
Niu, Q., Zhou, J., Zeng, A., Fan, Y., & Di, Z. (2016). Which publication is your representative work? Journal of Informetrics, 10(3), 842–853.
Noorhidawati, A., Aspura, M. Y. I., Zahila, M. N., & Abrizah, A. (2017). Characteristics of Malaysian highly cited papers. Malaysian Journal of Library & Information Science, 22(2), 85–99.
Ponomarev, I. V., Lawton, B. K., Williams, D. E., & Schnell, J. D. (2014). Breakthrough paper indicator 2.0: Can geographical diversity and interdisciplinarity improve the accuracy of outstanding papers prediction? Scientometrics, 100(3), 755–765.
Ponomarev, I. V., Williams, D. E., Hackett, C. J., Schnell, J. D., & Haak, L. L. (2014). Predicting highly cited papers: A method for early detection of candidate breakthroughs. Technological Forecasting and Social Change, 81, 49–55.
Pouris, A. (2007). The international performance of the South African academic institutions: A citation assessment. Higher Education, 54(4), 501–509.
Qi, M., Zeng, A., Li, M., Fan, Y., & Di, Z. (2017). Standing on the shoulders of giants: The effect of outstanding scientists on young collaborators careers. Scientometrics, 111(3), 1839–1850.
Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105(45), 17268–17272.
Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B: Condensed Matter and Complex Systems, 4(2), 131–134.
Shen, H. W., & Barabási, A. L. (2014). Collective credit allocation in science. Proceedings of the National Academy of Sciences, 111(34), 12325–12330.
Soteriades, E. S., & Falagas, M. E. (2005). Comparison of amount of biomedical research originating from the European Union and the United States. BMJ, 331(7510), 192–194.
Stewart, J. A. (1983). Achievement and ascriptive processes in the recognition of scientific articles. Social Forces, 62(1), 166–189.
Wang, D., Song, C., & Barabási, A. L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132.
Wang, M., Yu, G., Xu, J., He, H., Yu, D., & An, S. (2012). Development a case-based classifier for predicting highly cited papers. Journal of Informetrics, 6(4), 586–599.
Winnink, J. J., & Tijssen, R. J. (2015). Early stage identification of breakthroughs at the interface of science and technology: Lessons drawn from a landmark publication. Scientometrics, 102(1), 113–134.
Winnink, J. J., Tijssen, R. J., & van Raan, A. F. (2016). Theory-changing breakthroughs in science: The impact of research teamwork on scientific discoveries. Journal of the Association for Information Science and Technology, 67(5), 1210–1223.
Xiao, S., Yan, J., Li, C., Jin, B., Wang, X., Yang, X., et al. (2016, July). On modeling and predicting individual paper citation count over time. In IJCAI (pp. 2676–2682).
Yan, R., Tang, J., Liu, X., Shan, D., & Li, X. (2011, October). Citation count prediction: Learning to estimate future citations for literature. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 1247–1252). ACM.
Yao, L., Wei, T., Zeng, A., Fan, Y., & Di, Z. (2014). Ranking scientific publications: The effect of nonlinearity. Scientific Reports, 4, 6663.
Yu, T., Yu, G., Li, P. Y., & Wang, L. (2014). Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics, 101(2), 1233–1252.
Zeng, A., Shen, Z., Zhou, J., Wu, J., Fan, Y., Wang, Y., et al. (2017). The science of science: From the perspective of complex systems. Physics Reports, 714, 1–73.
Zhang, J., & Guan, J. (2017). Scientific relatedness and intellectual base: A citation analysis of un-cited and highly-cited papers in the solar energy field. Scientometrics, 110(1), 141–162.
Zhou, J., Zeng, A., Fan, Y., & Di, Z. (2016). Ranking scientific publications with similarity-preferential mechanism. Scientometrics, 106(2), 805–816.
Zhu, X., Wu, Q., Zheng, Y., & Ma, X. (2004). Highly cited research papers and the evaluation of a research university: A case study: Peking University 1974–2003. Scientometrics, 60(2), 237–347.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61603046 and 61374175) and the Natural Science Foundation of Beijing (Grant No. L160008).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wang, F., Fan, Y., Zeng, A. et al. Can we predict ESI highly cited publications?. Scientometrics 118, 109–125 (2019). https://doi.org/10.1007/s11192-018-2965-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-018-2965-6