Elsevier

Journal of Informetrics

Volume 4, Issue 1, January 2010, Pages 118-123
Journal of Informetrics

Hirsch-type characteristics of the tail of distributions. The generalised h-index

https://doi.org/10.1016/j.joi.2009.10.002Get rights and content

Abstract

In this paper a generalisation of the h-index and g-index is given on the basis of non-negative real-valued functionals defined on subspaces of the vector space generated by the ordered samples. Several Hirsch-type measures are defined and their basic properties are analysed. Empirical properties are illustrated using examples from the micro- and meso-level. Among these measures, the h-index proved the most, the arithmetic and geometric g-indices, the least robust measures. The μ-index and the harmonic g-index provide more balanced results and are still robust enough.

Introduction

The statistic analysis of the tail of scientometric distributions has always been a challenge to the community. Their typical “long-tail” property is shared with distributions observed in social processes and the tail stands usually for outstanding performance. Recently, the introduction of the h-index by Hirsch (2005) has given new impetus to the methodological research in this topic.

In principle two different approaches are use, traditionally, the determination and analysis of the tail on the basis of the complete distribution and the more recent method, which defines the tail of a distribution on the basis of the upper frequency ranks in a self-adjusting way. A typical example of the first approach is the method of characteristic scores and scales (CSS), was introduced in 1980s (Glänzel & Schubert, 1988). This method can be described as originated from iteratively truncating samples at their mean value and recalculating the mean of the truncated sample until the procedure is stopped or no new scores are obtained. The method aimed at finding self-adjusting scores for the high-end of citation distributions, particularly, by scaling the upper tail of those distributions. Although this method is self-adjusting according to the characteristics of the underlying distribution, the number of classes forming the tail of distribution is required as external “parameter”. All scores are defined iteratively, so to speak from the low-end up to the high-end. The other paradigmatic approach was introduced by Hirsch (2005). Besides the knowledge of the ordered sample no further statistics on or characteristics of the underlying distribution is required. The procedure for determining the relevant tail of the distribution starts with the largest observation and is continued down to a rank or index for which the process is stopped according to the given definition. The rest of the distribution beyond this index is practically not of interest for the algorithm. We will call such algorithms Hirsch-type approaches. The h-index is the most prominent representative of such method. The tail defined by the h-index is called Hirsch-core or h-core (Rousseau, 2006). Both types of methods are in a sense “orthogonal” to each other but it can be shown that they can be harmonised by adjusting the underlying ordered sample (Glänzel, 2009a).

Many attempts have been made to complement the h-index by other tail indicators using one of the two approaches. Among these efforts, the g-index introduced by Egghe (2006) proved the most promising one. It is actually a pure Hirsch-type indicator in the above-mentioned sense. Other solutions aimed at building statistics on the h-core or at normalising the h-index by subject, periods of publication activity or other criteria, and can thus be considered modifications or the original measures. Recently Woeginger (2009) has presented a true axiomatic generalisation of Egghe's g-index.

The original h-index is known to be robust, that is, it is relative insensitive to changes in both the high and the low-end of citation distributions. This feature is desirable, if the effect of inessential contingencies is intended to be filtered out. On the other hand, the same property results in low discriminative power (Glänzel, 2009b), i.e., the h-index does not reflect differences in citation distributions regarded as essential by some. Advocates of Egghe's g-index emphasize its definitely higher discriminative power, but its practical applicability is often obstructed by its utmost sensitivity to accidental outliers.

In this paper, a new class of generalised h-indices are advised. The basic idea is easily understood from an alternative definition of the g-index introduced by Schreiber (2008): g is the highest rank such that the top g papers have, on average, at least g citations; in other words, it is the highest rank, g, which is not higher than the mean citation rate of the g-core. The equivalence with Egghe's original definition is obvious. If, instead of the mean citation rate, other functionals of the g-core is used, a new class of indicators is obtained. With the proper choice of the functional, the features of the indicator can be fine-tuned at request, viz., a proper balance between robustness and discriminative power can be attained.

In what follows, the idea outlined above will be expounded in details. Examples from the micro- and meso-level will be used to empirically analyse the features of these measures.

Section snippets

Methodological rudiments

We consider the sequence of non-negative real-valued functionals lj defined on k-dimensional subspaces (1  k  n) of the vector space generated by the ordered sample {Xi}i=1n (X1  X2    Xn) of non-negative integer-valued random variables. We will use the following notation lk({Xi}ik) = lk(X1,…,Xk). Furthermore, assume that f is a non-negative real-valued monotone and continuously differentiable function defined on IR+. Then we consider the following transformation lk[f]({Xi}ik) = f−1(lk({f(Xi)}ik)).

Examples

In order to illustrate these properties of the above Hirsch-type indices in the context of tail characteristics, the same anonymous samples are used as introduced in an earlier paper (Glänzel, 2009a), where four individual authors were chosen from Thomson Reuters’ Web of Science representing a group of scientists with about 25 or more years of professional experience in three different subject areas, particularly, in mathematics, chemistry and social sciences. The individuals are denoted by A,

Concluding remarks

The functional approach allows to define or re-define Hirsch-type indices and to shed light of relations among them. The h-index proved the most robust indicator in this set of indices as it is practically insensitive to extreme values of the underlying distribution. However, it has a low discriminative power. Both the arithmetic (Egghe's) and geometric g-index are sensitive to changes in the tails of the distributions. The μ-index and the harmonic g-index provide more balanced results and are

References (12)

  • L. Egghe

    Theory and practice of the g-index

    Scientometrics

    (2006)
  • W. Glänzel

    On the opportunities and limitations of the h-index (in Chinese)

    Science Focus

    (2006)
  • W. Glänzel

    On the h-index—A mathematical approach to a new measure of publication activity and citation impact

    Scientometrics

    (2006)
  • W. Glänzel

    The role of the h-index and the characteristic scores and scales in testing the tail properties of scientometric distributions

  • Glänzel, W. (2009b). On reliability and robustness of scientometrics indicators. Proceedings of the 5th International...
  • W. Glänzel et al.

    Theoretical and empirical studies of the tail of scientometric distributions

There are more references available in the full text version of this article.

Cited by (16)

  • Growth dynamics of citations of cumulative papers of individual authors according to progressive nucleation mechanism: Concept of citation acceleration

    2013, Information Processing and Management
    Citation Excerpt :

    In recent years, the h index proposed by Hirsch (2005) to quantify the research output of individual scientists has drawn constant attention in the academic literature. Apart from contributions dealing, among others, with improvement and modification of the h index (for example, see: Alonso, Cabrerizo, Herrera-Viedma, & Herrera, 2009; Anderson, Hankin, & Killworth, 2008; Burrell, 2009; Csajbók, Berhidi, Vasas, & Schubert, 2007; Egghe, 2010a, 2010b, 2010c; Franceschini & Maisano, 2010a; Franceschini & Maisano, 2010b; Glänzel & Schubert 2010; Jin, Liang, Rousseau, & Egghe, 2007; Kosmulski, 2006; Navon, 2009; Prathap, 2006; Schubert, 2007) and discussion on the relationship between different bibliometric evaluation measures (Burrell, 2009; Van Raan, 2006), attempts have been made to give mathematical models to the h index and its modifications and to investigate its dependence on time (Burrell, 2007a, 2007b, 2009; Egghe, 2007, 2008, 2009; Glänzel, 2008; Hirsch, 2005; Nair & Turlach, 2012; Ye & Rousseau, 2008). The main prediction of the deterministic model of Hirsch (2005) and the stochastic model of Burrell (2007a, 2007b) is that the ratio h(t)/t is expected to be a time-independent constant which, according to Hirsch (2005) “should provide a useful yardstick to compare scientists of different seniority”.

  • Would it be possible to increase the Hirsch-index, π-index or CDS-index by increasing the number of publications or citations only by unity?

    2013, Journal of Informetrics
    Citation Excerpt :

    Accordingly, the eminence of the scientists is characterized by the number of papers in the “elite set” (h-core papers) within their total publications. The index represents a special statistics, and it depends on the distribution of citations among the individual journal papers and on the number of the publications (see, Glänzel, 2008; Glänzel & Schubert, 2010). The h-type indices may represent the number of citations to the h-core papers, e.g. A-index (Jin, 2006) and R-index (Jin, Liang, Rousseau, & Egghe, 2007) or different combinations of the rank number of papers and citations (e.g. g-index, Egghe, 2006).

  • The diffusion of H-related literature

    2011, Journal of Informetrics
    Citation Excerpt :

    Egghe (2009b) studied mathematical properties of h-index sequences as developed by Liang (2006). In Glänzel and Schubert (2010), a generalisation of the h-index and g-index was given on the basis of nonnegative real-valued functionals defined on subspaces of the vector space generated by the ordered samples. The authors further defined several Hirsch-type measures and analysed their basic properties.

  • A review on h-index and its alternative indices

    2023, Journal of Information Science
View all citing articles on Scopus
View full text