Definitions of time series in citation analysis with special attention to the h-index

https://doi.org/10.1016/j.joi.2008.04.003Get rights and content

Abstract

The structure of different types of time series in citation analysis is revealed, using an adapted form of the Frandsen–Rousseau notation. Special cases where this approach can be used include time series of impact factors and time series of h-indices, or h-type indices. This leads to a tool describing dynamic aspects of citation analysis. Time series of h-indices are calculated in some specific models.

Introduction

We began this investigation when we realized that the study of the evolution of the h-index, h-type indicators and more generally citation indicators is a topic not yet fully addressed. Hirsch (2005) claimed that the lifetime achievement h-index of a scientist grows linearly in time and provided some evidence. By and large this evidence was corroborated by Kelly and Jennions (2006). Yet, as one needs more and more citations to attain a one-point higher h-index it seems intuitively clear that the growth of a scientist's h-index should follow a concavely increasing curve as predicted by the Egghe–Rousseau Power Law Model (2006). Is Hirsch nevertheless correct and if so, why? Or could it be that the growth curve of the h-index is generally S-shaped as it is the case for some of the examples given in (Anderson, Hankin, & Killworth, 2008)? This problem will not be addressed in this article, but, as a first step, we intend to provide precise definitions of time series for h-indices. Such definitions are necessary to avoid possible confusion. We provide a general scheme and notation for indicating exactly which time series is studied.

Time series are used to better understand the underlying mechanism that produces them. They can also be used in forecasting. This aspect is interesting in the framework of research evaluation: how will a scientist or research group most likely perform in the future? Of course, the first question is: is this type of time series capable of predicting features that lie in the future? This question has been studied recently by Hirsch (2007) who found that the h-index series (our type 5, see further) is indeed a good predictor for future scientific achievements.

When we started writing this contribution it became soon clear that in a similar way as for h-indices time series for journal impact factors can be defined. As there already exists a precise notation for all types of impact factors (Frandsen & Rousseau, 2005) we adapt it to the topic studied in this contribution.

The article is organized as follows. The next section explains the adaptation of the notation introduced by Frandsen and Rousseau (2005). Then time series of citation data, based on a publication–citation matrix (in short: pc matrix) are defined and discussed in Sections 3 Types of time series of citation indicators, 4 Comments on these definitions and some examples. These two sections contain the essential ideas of this article. Sections 5 Hirsch’ elementary baseline model, 6 Time series in the power law model focus on the h-index, presenting different time series of h-indices for two very simple models. We conclude in Section 7.

Section snippets

The adapted Frandsen–Rousseau notation for publication and citation indicator calculations

We assume that the focus is on one set of articles. This set can be a scientist's research record, a journal, as in most examples, but it can also be the set of all journals in one particular field, or even all journals in a database. For this journal or scientist we intend to calculate an impact factor or an h-index (or a similar indicator). It might seem somewhat odd (and in practice not recommended) to calculate a person's impact factor, but, as long as this person publishes at least one

Types of time series of citation indicators

We keep a publication set fixed and study series of citation indicators derived from this set. We define now general time series of indicators and characterize what they say about the set of publications.

Time series are of the form (sk[number])k=1,,end, where k is an index ranging from time (year) 1 to some end time. As we consider several time series they are numbered by a superscript between square brackets. Specifics for each case are shown in Table 1. For a general element of the time

Comments on these definitions and some examples

Time series are used to study the dynamics of citation analysis, revealing trends and fluctuations. They may lead to (careful) predictions. The type 1 series leads to a series of diachronous indicators, making use of all data available in the pc matrix. If year Y + M  1 is the latest year for which data are available this is a natural approach, although publication years are treated unevenly. The type 2 series is similar to type 1 but uses cumulative data. In the application for journal impacts

Hirsch’ elementary baseline model

We consider Hirsch’ baseline model in which each year a fixed number of articles, p, is published, and each article receives each year a number of citations equal to c > 0. We consider M citation years and N publication years, N  M. Of course this model is not realistic at all, but it provides a kind of baseline. Moreover, it can be considered an ‘average’ model if we use for p and c the average number of publications and citations received during the period covered by the pc matrix. A series,

Time series in the power law model

In the power law model the h-index is equal to T1/α where T is the total number of publications under consideration and where the number of articles with t citations is given by a power law of the form C/tα. As in the first model we assume that the number of publications is the same each year and equal to p. No assumptions are made about the number of citations, except that they follow a power law. The exponent of the variable t is in general different in each case and for each element in a

Conclusion

The empirical pc matrix combined with the adapted F–R notation clearly draws attention to fundamentally different approaches to time series in citation analysis. Time series are used to study the dynamics of citation analysis, expounding trends and fluctuations. They may lead to predictions. We hope that the series we designed will be useful in addressing problems related to the development and characterisation of scientific research, as represented by citation indicators such as impact

Acknowledgements

The authors thank Leo Egghe, Per Ahlgren and Ravichandra Rao for useful suggestions during the preparation of this article. Research of R. Rousseau was supported by the National Natural Science Foundation of China through grant no. 70673019.

References (21)

  • C.D. Kelly et al.

    The h-index and career assessment by numbers

    Trends in Ecology and Evolution

    (2006)
  • T.T. Anderson et al.

    Beyond the Durfee square: Enhancing the h-index to score total publication output

    Scientometrics

    (2008)
  • Q.L. Burrell

    Hirsch index or Hirsch rate? Some thoughts arising from Liang's data

    Scientometrics

    (2007)
  • Egghe, L. (2008). Mathematical study of h-index sequences....
  • L. Egghe et al.

    Fundamental properties of rhythm sequences

    Journal of the American Society for Information Science and Technology

    (2008)
  • L. Egghe et al.

    An informetric model for the Hirsch-index

    Scientometrics

    (2006)
  • R.J. Epstein

    Journal impact factors do not equitably reflect academic staff performance in different medical subspecialties

    Journal of Investigative Medicine

    (2004)
  • T.F. Frandsen et al.

    Article impact calculated over arbitrary periods

    Journal of the American Society for Information Science and Technology

    (2005)
  • P.H. Franses

    The diffusion of scientific publications: the case of Econometrica, 1987

    Scientometrics

    (2003)
  • W. Glänzel

    Towards a model for diachronous and synchronous citation analyses

    Scientometrics

    (2004)
There are more references available in the full text version of this article.

Cited by (31)

  • Information and misinformation in bibliometric time-trend analysis

    2018, Journal of Informetrics
    Citation Excerpt :

    Synchronous indices use a constant year or set of years (which may mean omitting some known data) whereas diachronous indices use all available data but may then use different sets of citing years for publication years. Frandsen and Rousseau (2005) developed the idea, illustrating this with calculations of article impact over arbitrary periods, and Liu and Rousseau (2008) defined a structured set of time series in citation analysis, illustrating this via the h-index. This paper starts with the synchronous/diachronous disparity and shows the divergent answers that can be produced to the question ‘how does national citation performance change over time?’.

  • Ratios of h-cores, h-tails and uncited sources in sets of scientific papers and technical patents

    2013, Journal of Informetrics
    Citation Excerpt :

    Collected data are shown in Appendix A (Table A1). As there are many types of h-index sequences possible, we first note that the h-index sequences calculated in Table 1 are of type II as defined in Liu and Rousseau (2008). On the basis of the above data, we computed the three ratios RH, SH and SZ for the six topics.

  • Empirical study of the growth dynamics in real career h-index sequences

    2011, Journal of Informetrics
    Citation Excerpt :

    The main limitation of the work presented here is the relatively small size of the dataset. In the future, besides using a larger dataset, the dynamics of other types of h-index sequences (introduced by Liu and Rousseau (2008) and Egghe (2009a)) could be studied to gain a further insights into the growth dynamics of h-index sequences. JW proposed the classification procedure and wrote the paper.

View all citing articles on Scopus
View full text