Skip to main content
Log in

Stopped sum models and proposed variants for citation data

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

It is important to identify the most appropriate statistical model for citation data in order to maximise the potential of future analyses as well as to shed light on the processes that may drive citations. This article assesses stopped sum models and some variants and compares them with two previously used models, the discretised lognormal and negative binomial, using the Akaike Information Criterion (AIC). Based upon data from 20 Scopus categories, some of the stopped sum variant models had lower AIC values than the discretised lognormal models, which were otherwise the best (with respect to AIC). However, very large standard errors were returned for some of these variant models, indicating the imprecision of the estimates and the impracticality of the approach. Hence, although the stopped sum variant models show some promise for citation analysis, they are only recommended when they fit better than the alternatives and have manageable standard errors. Nevertheless, their good fit to citation data gives evidence that two different, but related, processes may drive citations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.

    Google Scholar 

  • Bookstein, A. (2001). Implications of ambiguity for scientometric measurement. Journal of the American Society for Information Science and Technology, 52(1), 74–79. doi:10.1002/1532-2890(2000)52:1<74:AID-ASI1052>3.0.CO;2-C.

    Article  Google Scholar 

  • Bornmann, L., Schier, H., Marx, W., & Daniel, H.-D. (2012). What factors determine citation counts of publications in chemistry besides their quality? Journal of Informetrics, 6(1), 11–18. doi:10.1016/j.joi.2011.08.004.

    Article  Google Scholar 

  • Bozdogan, H. (2000). Akaike’s Information Criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44(1), 62–91. doi:10.1006/jmps.1999.1277.

    Article  MathSciNet  MATH  Google Scholar 

  • Burnham, K.P., & Anderson, D.R. (2003). Model selection and multi-model inference: A practical information-theoretic approach (2nd ed.). Springer.

  • Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. doi:10.1137/070710111.

    Article  MathSciNet  MATH  Google Scholar 

  • De Solla Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306. doi:10.1002/asi.4630270505.

    Article  Google Scholar 

  • Deschacht, N., & Engels, T. C. E. (2014). Limited dependent variable models and probabilistic prediction in informetrics. In Measuring scholarly impact (pp. 193–214). doi:10.1007/978-3-319-10377-8_9.

  • Didegah, F., & Thelwall, M. (2013). Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics, 7(4), 861–873. doi:10.1016/j.joi.2013.08.006.

    Article  Google Scholar 

  • Dobbie, M. J., & Welsh, A. H. (2001). Models for zero-inflated count data using the Neyman type A distribution. Statistical Modelling, 1(1), 65–80. doi:10.1191/147108201128096.

    Article  MATH  Google Scholar 

  • Dodge, Y. (2003). The Oxford dictionary of statistical terms. In S. D. Cox, D. Commenges, A. Davison, P. Solomon, & S. Wilson (Eds.), (1st ed.). Oxford: Oxford University Press.

  • Hesse, M. B. (1953). Models in Physics. The British Journal for the Philosophy of Science, 4(15), 198–214.

    Article  Google Scholar 

  • Johnson, N. L., Kemp, A. W., & Kotz, S. (2005). Univariate discrete distribution (3rd ed.). New York: Wiley-Interscience.

    Book  MATH  Google Scholar 

  • Karlis, D., & Xekalaki, E. (2007). Mixed Poisson distributions. International Statistical Review, 73(1), 35–58. doi:10.1111/j.1751-5823.2005.tb00250.x.

    Article  MATH  Google Scholar 

  • Lee, Y. G., Lee, J. D., Song, Y. I., & Lee, S. J. (2007). An in-depth empirical analysis of patent citation counts using zero-inflated count data model: The case of KIST. Scientometrics, 70(1), 27–39. doi:10.1007/s11192-007-0102-z.

    Article  Google Scholar 

  • Low, W. J., Wilson, P., & Thelwall, M. (2015). Stopped sum models for citation data. In A. A. Salah, Y. Tonta, A. A. A. Salah, C. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June–3 July, 2015 (pp. 184–195). Istanbul: Boğaziçi University.

    Google Scholar 

  • Maurseth, P. B., & Verspagen, B. (2002). Knowledge spillovers in Europe: A patent citations analysis. Scandinavian Journal of Economics, 104(4), 531–545. doi:10.1111/1467-9442.00300.

    Article  Google Scholar 

  • Merton, R. K. (1968). The Matthew effect in science: The reward and communication systems of science are considered. Science, 159(3810), 56–63. doi:10.1126/science.159.3810.56.

    Article  Google Scholar 

  • Neyman, J. (1939). On a new class of “contagious” distributions, applicable in entomology and bacteriology. The Annals of Mathematical Statistics, 10(1), 35–57. doi:10.1214/aoms/1177732245.

    Article  MATH  Google Scholar 

  • Nikoloulopoulos, A. K., & Karlis, D. (2008). On modeling count data: a comparison of some well-known discrete distributions. Journal of Statistical Computation and Simulation,. doi:10.1080/10629360601010760.

    MathSciNet  MATH  Google Scholar 

  • Oliveira, M., Einbeck, J., Higueras, M., Ainsbury, E., Puig, P., & Rothkamm, K. (2015). Zero-inflated regression models for radiation-induced chromosome aberration data: A comparative study. Biometrical Journal,. doi:10.1002/bimj.201400233.

    Google Scholar 

  • R Core Team. (2014). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

  • Rigby, R. A., Stasinopoulos, D. M., & Lane, P. W. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society. Series C Applied Statistics, 54(3), 507–554. doi:10.1111/j.1467-9876.2005.00510.x.

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert, D. (2011). Statistics and data analysis for financial engineering. New York: Springer.

    Book  MATH  Google Scholar 

  • Thelwall, M., & Wilson, P. (2014a). Distributions for cited articles from individual subjects and years. Journal of Informetrics, 8(4), 824–839. doi:10.1016/j.joi.2014.08.001.

    Article  Google Scholar 

  • Thelwall, M., & Wilson, P. (2014b). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8(4), 963–971. doi:10.1016/j.joi.2014.09.011.

    Article  Google Scholar 

  • Van Raan, A. F. J. (2004). Sleeping Beauties in science. Scientometrics, 59(3), 467–472. doi:10.1023/B:SCIE.0000018543.82441.f1.

    Article  Google Scholar 

  • Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer.

    Book  MATH  Google Scholar 

  • Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.

    Article  Google Scholar 

  • Zhu, R., & Joe, H. (2009). Modelling heavy-tailed count data using a generalised Poisson-inverse Gaussian family. Statistics and Probability Letters, 79(15), 1695–1703. doi:10.1016/j.spl.2009.04.011.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wan Jing Low.

Appendix

Appendix

Tables 3, 4, 5 and 6.

Table 3 AIC for all subjects for each stopped sum variant models, compared with discretised lognormal and negative binomial
Table 4 AIC for all subjects for Poisson, Neyman type A, Polya Aeppli, PIG, ZIP and ZINB compared with discretised lognormal and negative binomial
Table 5 Estimated parameters of negative binomial model with the SVA models
Table 6 Estimated parameters of negative binomial model with the modified SVB models

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Low, W.J., Wilson, P. & Thelwall, M. Stopped sum models and proposed variants for citation data. Scientometrics 107, 369–384 (2016). https://doi.org/10.1007/s11192-016-1847-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-1847-z

Keywords

Navigation