Skip to main content
Log in

Citation impact prediction for scientific papers using stepwise regression analysis

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Researchers typically pay greater attention to scientific papers published within the last 2 years, and especially papers that may have great citation impact in the future. However, the accuracy of current citation impact prediction methods is still not satisfactory. This paper argues that objective features of scientific papers can make citation impact prediction relatively accurate. The external features of a paper, features of authors, features of the journal of publication, and features of citations are all considered in constructing a paper’s feature space. The stepwise multiple regression analysis is used to select appropriate features from the space and to build a regression model for explaining the relationship between citation impact and the chosen features. The validity of this model is also experimentally verified in the subject area of Information Science & Library Science. The results show that the regression model is effective within this subject.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation, 12(3), 159–170.

    Article  Google Scholar 

  • Borsuk, R. M., Budden, A. E., Leimu, R., Aarssen, L. W., & Lortie, C. J. (2009). The influence of author gender, national language and number of authors on citation rate in ecology. Open Ecology Journal, 2, 25–28.

    Article  Google Scholar 

  • Boyack, K. W., & Klavans, R. (2011). Multiple dimensions of journal specificity: Why journals can’t be assigned to disciplines. In E. Noyons, P. Ngulube, & J. Leta (Eds.), The 13th conference of the international society for scientometrics and informetrics (Vol. I, pp. 123–133). Durban: ISSI, Leiden University and the University of Zululand.

    Google Scholar 

  • Burrell, Q. L. (2001). Stochastic modelling of the first-citation distribution. Scientometrics, 52, 3–12.

    Article  Google Scholar 

  • Burrell, Q. L. (2003). Predicting future citation behavior. Journal of the American Society for Information Science and Technology, 54(5), 372–378.

    Article  Google Scholar 

  • Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.

    Article  Google Scholar 

  • Didegah, F., & Thelwall, M. (2013). Determinants of research citation impact in nanoscience and nanotechnology. Journal of the American Society for Information Science and Technology, 64(5), 1055–1064.

    Article  Google Scholar 

  • Feitelson, D., & Yovel, U. (2004). Predictive ranking of computer scientists using CiteSeer data. Journal of Documentation, 60(1), 44–61.

    Article  Google Scholar 

  • Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85(1), 257–270.

    Article  Google Scholar 

  • Fu, L. D., Aphinyanaphongs, Y., & Aliferis, C. F. (2013). Computer models for identifying instrumental citations in the biomedical literature. Scientometrics, 97(3), 871–882.

    Article  Google Scholar 

  • Garfield, E. (1979). Citation indexing: Its theory and application in science, technology and humanities. New York: Wiley.

    Google Scholar 

  • Gazni, A., & Didegah, F. (2010). Investigating different types of research collaboration and citation impact: A case study of Harvard University’s publications. Scientometrics, 87(2), 251–265.

    Article  Google Scholar 

  • Gibbons, M. R. (1982). Multivariate tests of financial models: A new approach. Journal of Financial Economics, 10(1), 3–27.

    Article  Google Scholar 

  • Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better later than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.

    Article  Google Scholar 

  • Glänzel, W., & Schubert, A. (1995). Predictive aspects of a stochastic model for citation processes. Information Processing and Management, 31(1), 69–80.

    Article  Google Scholar 

  • Hargens, L. L., & Schuman, H. (1990). Citation counts and social comparisons: Scientists’ use and evaluation of citation index data. Social Science Research, 19(3), 205–221.

    Article  Google Scholar 

  • Kleinbaum, D. G., Kupper, L. L., Muller, K. E., & Nizam, A. (1998). Applied regression analysis and other multivariable methods. Pacific Grove: Brooks/Cole Publishing Company.

    Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 2(12), 1137–1143.

    Google Scholar 

  • Leimu, R., & Koricheva, J. (2005). Does scientific collaboration increase the impact of ecological articles? BioScience, 55(5), 438–443.

    Article  Google Scholar 

  • Leydesdorff, L. (2012). Alternatives to the journal impact factor: I3 and the top-10 % (or top-25 %?) of the most-highly cited papers. Scientometrics, 92(2), 355–365.

    Article  Google Scholar 

  • Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (IFs): An alternative design with policy implications. Journal of the American Society for Information Science and Technology, 62(7), 1370–1381.

    Article  Google Scholar 

  • Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.

    Article  Google Scholar 

  • Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3), 265–277.

    Article  Google Scholar 

  • Peñas, C. S., & Willett, P. (2006). Brief communication: Gender differences in publication and citation counts in librarianship and information science research. Journal of Information Science, 32(5), 480–485.

    Article  Google Scholar 

  • Portes, A. (1998). Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24, 1–24.

    Article  Google Scholar 

  • Prpić, K. (2002). Gender and productivity differentials in science. Scientometrics, 55(1), 27–58.

    Article  Google Scholar 

  • Radicchi, F., & Castellano, C. (2012). Testing the fairness of citation indicators for comparison across scientific domains: The case of fractional citation counts. Journal of Informetrics, 6(1), 121–130.

    Article  Google Scholar 

  • Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. PNAS, 105(45), 17268–17272.

    Article  Google Scholar 

  • Sin, S. C. J. (2011). International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008. Journal of the American Society for Information Science and Technology, 62(9), 1770–1783.

    Article  Google Scholar 

  • Stewart, J. A. (1983). Achievement and ascriptive processes in the recognition of scientific articles. Social Forces, 62(1), 166–189.

    Article  Google Scholar 

  • Van Dalen, H. P., & Henkens, K. (1999). How influential are demography journals? Population and Development Review, 25(2), 229–251.

    Article  Google Scholar 

  • Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50(3), 455–482.

    Article  Google Scholar 

  • Van Dalen, H. P., & Henkens, K. (2005). Signals in science-on the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.

    Article  Google Scholar 

  • Wang, M. Y., Yu, G., An, S., & Yu, D. R. (2012). Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics, 93(3), 635–644.

    Article  Google Scholar 

  • Wang, M. Y., Yu, G., & Yu, D. R. (2011). Mining typical features for highly cited papers. Scientometrics, 87(3), 695–706.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 70973031).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, T., Yu, G., Li, PY. et al. Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics 101, 1233–1252 (2014). https://doi.org/10.1007/s11192-014-1279-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1279-6

Keywords

Navigation