Elsevier

Journal of Informetrics

Volume 7, Issue 1, January 2013, Pages 36-49
Journal of Informetrics

Comparison of different mathematical functions for the analysis of citation distribution of papers of individual authors

https://doi.org/10.1016/j.joi.2012.09.002Get rights and content

Abstract

The citation distribution of papers of selected individual authors was analyzed using five mathematical functions: power-law, stretched exponential, logarithmic, binomial and Langmuir-type. The former two functions have previously been proposed in the literature whereas the remaining three are novel and are derived following the concepts of growth kinetics of crystals in the presence of additives which act as inhibitors of growth. Analysis of the data of citation distribution of papers of the authors revealed that the value of the goodness-of-the-fit parameter R2 was the highest for the empirical binomial relation, it was high and comparable for stretched exponential and Langmuir-type functions, relatively low for power law but it was the lowest for the logarithmic function. In the Langmuir-type function a parameter K, defined as Langmuir constant, characterizing the citation behavior of the authors has been identified. Based on the Langmuir-type function an expression for cumulative citations L relating the extrapolated value of citations l0 corresponding to rank n = 0 for an author and his/her constant K and the number N of paper receiving citation l  1 is also proposed.

Highlights

► Citation rank-order distribution of selected authors is analyzed. ► Power-law, exponential, logarithmic, binomial and Langmuir-type relations are used. ► Among the functions, logarithmic, binomial and Langmuir-type functions are novel. ► Binomial, exponential and Langmuir-type functions describe the data satisfactorily. ► Parameter K of Langmuir-type function characterizes citation behavior of an author.

Introduction

Investigation of distribution of authors, citations and publications is an active research area in informetrics (Egghe and Waltman, 2011, Egghe, 2009, Egghe, 2011, Egghe, 2012, Guerrero-Bote et al., 2007, Kretschmer and Rousseau, 2001, Lancho-Barrantes et al., 2010, Leherrere and Sornette, 1998Perc, 2010, Radicchi et al., 2008, Redner, 1998, Redner, 2005, Tsallis and de Albuquerque, 2000, Vieira and Gomes, 2010, Wallace et al., 2009). Various laws (e.g. Lotka's and Zipf's laws) and functions have been proposed in the literature to describe these informetric distributions and to explain the mechanism underlying their occurrence. Citation distributions, for example, have been studied using the following approaches: (1) theoretical studies involving modeling of citation behavior using a preselected mathematical function to generate citations (Burrell, 2001, Burrell, 2002, Egghe, 2009, Egghe, 2012, Kretschmer and Rousseau, 2001, Nadarajah and Kotz, 2007), (2) empirical studies devoted to the analysis of a dataset, constructed over a selected time window or a long period of time for a single discipline, speciality or journal, using known mathematical functions (Bornmann and Daniel, 2009, Clauset et al., 2009, Companario, 2010, Perc, 2010, Radicchi et al., 2008, Redner, 1998, Redner, 2005, Vieira and Gomes, 2010, Wallace et al., 2009), and (3) phenomenological approach based on describing citation data using specific microscopic models (Barabasi and Albert, 1999, Gupta et al., 2008, Naumis and Cocho, 2007, Price, 1965, Price, 1976, Simkin and Roychowdhury, 2007, Tsallis and de Albuquerque, 2000, Wallace et al., 2009).

A power-law type of behavior of citation distribution was suggested in the first empirical studies in the area (Naranan, 1971, Price, 1965, Price, 1976, Seglen, 1992), but it is now recognized that a single function such as power law is unable to describe citatation distributions over the whole range of citations (Perc, 2010, Radicchi et al., 2008, Redner, 1998, Redner, 2005, van Raan, 2001, Wallace et al., 2009). Redner, 1998, Redner, 2005 examined the citation distributions of large data sets of articles published in Physical Review and found that a power-law behavior dominates at high number of citations whereas a stretched exponential function provides a better fit at a small numer of citations. Based on the general, nonextensive thermostatistical formalism, Tsallis and de Albuquerque (2000) proposed a new function, now known as Tsallis distribution function, to describe the citation distribution in the entire citation range. Radicchi et al. (2008) used the lognormal distribution function, derived from reorganization of the stretched exponential function, to fit data on 14 among more than 200 subject categories. Wallace et al. (2009) examined the citation distribution of papers published between 1900 and 2006 in natural sciences and engineering, medicine and social sciences and found that stretched-exponential and Tsallis's distribution functions fit the entire citation data satisfactorily. Vieira and Gomes (2010) analyzed the distribution of five-year citations of papers published in 2004 in chemistry, biology and biochemistry, mathematics and physics using one- and two-parameter (double) exponential-Poisson distributions and found that the double exponential-Poisson distribution describes the data well. Perc (2010) examined the distributions of citations of individual papers published during 1970 and 2009 by researchers in Slovenia using Zipf's plots, power law and lognormal distributions. It was found that the data follows power law at low and high values of citations.

Guerrero-Bote et al. (2007) and Lancho-Barrantes et al. (2010) studied Journal Impact Factor (JIF) rank-order distribution and found that the distributions of JIFs were fairy close to exponential, which could be fitted to a logarithmic function. However, these authors also encountered subject areas having shapes of their JIF rank-order distributions with more sharply defined peaks and relatively long tails, something like icebergs. These authors suggested that icebergs (i.e. scientific areas) are exporters of ideas because the knowledge generated within them is visible from other areas which then import it (iceberg hypothesis).

A general feature of many informetric distributions is that the shape of the size-frequency distribution f and the shape of the rank-frequency distribution g are interrelated (Egghe and Rousseau, 2006, Egghe and Rousseau, 2012). When the size-frequency distribution is a monotonically decreasing function, the corresponding rank-frequency distribution is convex in the entire range. In contrast to this, when the size-frequency distribution increases first and then steadily decreases, thus passing through a maximum, the rank-frequency distribution is convex initially and concave at the end. The curvature of the concave part of the rank-frequency distribution is related to the position of the maximum in the size-frequency distribution. These size- and rank-frequency distributions are explained by empirical functions other than those following from Lotka's and Zipf's laws (Companario, 2010, Egghe and Waltman, 2011, Mansilla et al., 2007). However, this type of distribution behavior can also be explained by Tsallis distribution function based on nonextensive thermostatistical formalism (Gupta et al., 2008, Tsallis and de Albuquerque, 2000).

The above literature shows that distribution of authors, citations and publications in single discipline, speciality or journals has been investigated until now using known mathematical functions. However, no study has been devoted so far to the analysis of the distribution of citations of papers as a function of paper rank at the level of individual authors. The aim of the present paper is to analyze the citation distribution of papers of different selected authors using five mathematical functions. Two of these, the power law and the extended exponential function, are well known in the citation literature, whereas the remaining three are novel mathematical functions. Among the new functions, the logarithmic function proposed for the analysis is similar to that used by Guerrero-Bote et al. (2007) and Lancho-Barrantes et al. (2010) for their iceberg hypothesis. The new mathematical functions proposed in this work are derived following the concepts of growth kinetics of crystals in the presence of additives which act as inhibitors of growth (Appendix A Adsorption processes and crystal growth, Appendix B Derivation of Eqs.). An additional aim of the study is to propose a possible mechanism of the citation rank-order distribution in terms of physical processes at the elementary level.

Section snippets

Mathematical functions

In this section the mathematical functions used in this study for the analysis of the citation distribution of the papers of different authors are briefly described.

If an author publishes N papers and ln denotes the number of citations of the nth paper such that n is ranked in the order of decreasing citations ln, the relation between ln and n is given by the power-law distributionln=l0nδ,power law,where l0 > 0, δ > 0 and 1 < n < N. Here l0 is the extrapolated value of ln when n  0. The value of the

Citation data of selected authors

We used Thomson Reuters’ ISI Web of Knowledge (Web of Science) to collect and analyze the citations of nine arbitrarily selected scientists from different research disciplines. J. Barnaś (JB), T. Ditl (TD), S. Krukowski (SK), K. Sangwal (KS), and Z.R. Żytkiewicz (ZRZ), are physicists, M. Kosmulski (MK) is a chemist, K.J. Kurzydłowski (KJK) is a materials scientist, whereas Q.L. Burrell (QLB) and L. Egghe (LE) are informetricians. The first seven of these scientists are from Poland whereas the

Analysis of citation distribution data

The real ln(n) data for different authors were confronted with Eqs. (1), (2), (3), (4), (5) mentioned above using two approaches. In the first case, the applicability of power law (1), stretched exponential function (2) and logarithmic function (3) was checked by plotting the ln(n) data in the form of dependences of (a) ln ln on ln n, (b) ln ln on n and (c) ln on ln n, as shown in Fig. 1, Fig. 2, Fig. 3, respectively. These forms of the dependences follow from Eqs. (1), (2), (3), and are usually

Summary and conclusions

The citation rank-order distribution of papers of different selected authors was analyzed in this work using five functions: power law (1), stretched exponential function (2), logarithmic function (3), binomial function (4) and Langmuir-type function (5). The former two functions have previously been proposed in the literature whereas the remaining three are novel and are derived following the concepts of growth kinetics of crystals in the presence of additives which act as inhibitors of

Acknowledgements

The author expresses his gratitude to the anonymous referees for their advice and suggestions. He is also grateful to Dr. K. Wójcik for preparing Fig. A1, Fig. B1.

References (42)

  • A.L. Barabasi et al.

    Emerging of scaling in random networks

    Science

    (1999)
  • L. Bornmann et al.

    Universality of citation distribution – A validation of Radicchi et al.’s relative indicator cf = c/c0 at the micro level using data from chemistry

    Journal of the American Society for Information Science and Technology

    (2009)
  • Q.L. Burrell

    Stochastic modeling of the first-citation distribution

    Scientometrics

    (2001)
  • Q.L. Burrell

    The nth-citation distribution and oblescence

    Scientometrics

    (2002)
  • A. Clauset et al.

    Power-law distributions in empirical data

    SIAM Review

    (2009)
  • A.A. Chernov

    Modern crystallography III: Crystal growth

    (1984)
  • J.M. Companario

    Distribution of ranks of articles and citations in journals

    Journal of the American Society for Information Science and Technology

    (2010)
  • D.F. Eggers et al.

    Physical chemistry

    (1964)
  • L. Egghe

    A rationale for the Hirsch-index rank-order distribution and a comparison with the impact factor rank-order distribution

    Journal of the American Society for Information Science and Technology

    (2009)
  • L. Egghe

    The impact factor rank-order distribution revisited

    Scientometrics

    (2011)
  • L. Egghe

    Study of rank- and size-frequency functions in the case of power law growth of sources and items and proof of Heaps’ law

    Information Processing and Management

    (2012)
  • Cited by (10)

    • Breadth and depth of citation distribution

      2015, Information Processing and Management
    • Distributions of citations of papers of individual authors publishing in different scientific disciplines: Application of Langmuir-type function

      2014, Journal of Informetrics
      Citation Excerpt :

      The aim of the paper is two-fold: (1) to analyze the distribution of cumulative citations L and contributed citations Lf to every multiauthored papers published by individual authors working in different scientific disciplines using the newly proposed Langmuir-type function, and (2) to investigate the relationship between the Langmuir constant K of the distribution function, the number Nc of papers of an individual author receiving citations and the effectiveness parameter α of this function. Sangwal (2013a, 2013b) reported the Langmuir-type function of rank-order distribution of items following the concepts of adsorption processes involved during crystal growth and the basic concepts used in its derivation. The basic concepts used in the derivation of this function are briefly described below.

    • The h-index: A case of the tail wagging the dog?

      2013, Journal of Informetrics
      Citation Excerpt :

      He also gives the number of papers and number of citations for each so that we can calculate the quasi h-index as described. ( Note that when constructing the standard frequency distributions from Tables 1 and 2 of Sangwal (2013) we discovered a number of inconsistencies. In Table 1 we have relied on the summary statistics as reported by Sangwal (2013).)

    • Citation and impact factor distributions of scientific journals published in individual countries

      2013, Journal of Informetrics
      Citation Excerpt :

      This fitting procedure is described in detail in the previous paper (Sangwal, 2013a). As mentioned before (Sangwal, 2013a), we observed empirically that for a simple two-parameter equation the “dependency” is related to the goodness-of-the-fit parameter R2, i.e. dependency1/2 ≈ R2, but the values of the fitting “dependency” for different parameters of a nonlinear mathematical function are different for a dataset. Therefore, we considered the lowest value of the “estimated” goodness-of-the-fit parameter R2 from the set of the “dependencies” corresponding to the best-fit parameters of a mathematical equation used for the analysis of a given dataset.

    • Three dimensions of scientific impact

      2020, Proceedings of the National Academy of Sciences of the United States of America
    View all citing articles on Scopus
    View full text