Abstract
Researchers typically pay greater attention to scientific papers published within the last 2 years, and especially papers that may have great citation impact in the future. However, the accuracy of current citation impact prediction methods is still not satisfactory. This paper argues that objective features of scientific papers can make citation impact prediction relatively accurate. The external features of a paper, features of authors, features of the journal of publication, and features of citations are all considered in constructing a paper’s feature space. The stepwise multiple regression analysis is used to select appropriate features from the space and to build a regression model for explaining the relationship between citation impact and the chosen features. The validity of this model is also experimentally verified in the subject area of Information Science & Library Science. The results show that the regression model is effective within this subject.
Similar content being viewed by others
References
Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation, 12(3), 159–170.
Borsuk, R. M., Budden, A. E., Leimu, R., Aarssen, L. W., & Lortie, C. J. (2009). The influence of author gender, national language and number of authors on citation rate in ecology. Open Ecology Journal, 2, 25–28.
Boyack, K. W., & Klavans, R. (2011). Multiple dimensions of journal specificity: Why journals can’t be assigned to disciplines. In E. Noyons, P. Ngulube, & J. Leta (Eds.), The 13th conference of the international society for scientometrics and informetrics (Vol. I, pp. 123–133). Durban: ISSI, Leiden University and the University of Zululand.
Burrell, Q. L. (2001). Stochastic modelling of the first-citation distribution. Scientometrics, 52, 3–12.
Burrell, Q. L. (2003). Predicting future citation behavior. Journal of the American Society for Information Science and Technology, 54(5), 372–378.
Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.
Didegah, F., & Thelwall, M. (2013). Determinants of research citation impact in nanoscience and nanotechnology. Journal of the American Society for Information Science and Technology, 64(5), 1055–1064.
Feitelson, D., & Yovel, U. (2004). Predictive ranking of computer scientists using CiteSeer data. Journal of Documentation, 60(1), 44–61.
Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85(1), 257–270.
Fu, L. D., Aphinyanaphongs, Y., & Aliferis, C. F. (2013). Computer models for identifying instrumental citations in the biomedical literature. Scientometrics, 97(3), 871–882.
Garfield, E. (1979). Citation indexing: Its theory and application in science, technology and humanities. New York: Wiley.
Gazni, A., & Didegah, F. (2010). Investigating different types of research collaboration and citation impact: A case study of Harvard University’s publications. Scientometrics, 87(2), 251–265.
Gibbons, M. R. (1982). Multivariate tests of financial models: A new approach. Journal of Financial Economics, 10(1), 3–27.
Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better later than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.
Glänzel, W., & Schubert, A. (1995). Predictive aspects of a stochastic model for citation processes. Information Processing and Management, 31(1), 69–80.
Hargens, L. L., & Schuman, H. (1990). Citation counts and social comparisons: Scientists’ use and evaluation of citation index data. Social Science Research, 19(3), 205–221.
Kleinbaum, D. G., Kupper, L. L., Muller, K. E., & Nizam, A. (1998). Applied regression analysis and other multivariable methods. Pacific Grove: Brooks/Cole Publishing Company.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 2(12), 1137–1143.
Leimu, R., & Koricheva, J. (2005). Does scientific collaboration increase the impact of ecological articles? BioScience, 55(5), 438–443.
Leydesdorff, L. (2012). Alternatives to the journal impact factor: I3 and the top-10 % (or top-25 %?) of the most-highly cited papers. Scientometrics, 92(2), 355–365.
Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (IFs): An alternative design with policy implications. Journal of the American Society for Information Science and Technology, 62(7), 1370–1381.
Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.
Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3), 265–277.
Peñas, C. S., & Willett, P. (2006). Brief communication: Gender differences in publication and citation counts in librarianship and information science research. Journal of Information Science, 32(5), 480–485.
Portes, A. (1998). Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24, 1–24.
Prpić, K. (2002). Gender and productivity differentials in science. Scientometrics, 55(1), 27–58.
Radicchi, F., & Castellano, C. (2012). Testing the fairness of citation indicators for comparison across scientific domains: The case of fractional citation counts. Journal of Informetrics, 6(1), 121–130.
Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. PNAS, 105(45), 17268–17272.
Sin, S. C. J. (2011). International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008. Journal of the American Society for Information Science and Technology, 62(9), 1770–1783.
Stewart, J. A. (1983). Achievement and ascriptive processes in the recognition of scientific articles. Social Forces, 62(1), 166–189.
Van Dalen, H. P., & Henkens, K. (1999). How influential are demography journals? Population and Development Review, 25(2), 229–251.
Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50(3), 455–482.
Van Dalen, H. P., & Henkens, K. (2005). Signals in science-on the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.
Wang, M. Y., Yu, G., An, S., & Yu, D. R. (2012). Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics, 93(3), 635–644.
Wang, M. Y., Yu, G., & Yu, D. R. (2011). Mining typical features for highly cited papers. Scientometrics, 87(3), 695–706.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 70973031).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, T., Yu, G., Li, PY. et al. Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics 101, 1233–1252 (2014). https://doi.org/10.1007/s11192-014-1279-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1279-6