Definitions of time series in citation analysis with special attention to the h-index
Introduction
We began this investigation when we realized that the study of the evolution of the h-index, h-type indicators and more generally citation indicators is a topic not yet fully addressed. Hirsch (2005) claimed that the lifetime achievement h-index of a scientist grows linearly in time and provided some evidence. By and large this evidence was corroborated by Kelly and Jennions (2006). Yet, as one needs more and more citations to attain a one-point higher h-index it seems intuitively clear that the growth of a scientist's h-index should follow a concavely increasing curve as predicted by the Egghe–Rousseau Power Law Model (2006). Is Hirsch nevertheless correct and if so, why? Or could it be that the growth curve of the h-index is generally S-shaped as it is the case for some of the examples given in (Anderson, Hankin, & Killworth, 2008)? This problem will not be addressed in this article, but, as a first step, we intend to provide precise definitions of time series for h-indices. Such definitions are necessary to avoid possible confusion. We provide a general scheme and notation for indicating exactly which time series is studied.
Time series are used to better understand the underlying mechanism that produces them. They can also be used in forecasting. This aspect is interesting in the framework of research evaluation: how will a scientist or research group most likely perform in the future? Of course, the first question is: is this type of time series capable of predicting features that lie in the future? This question has been studied recently by Hirsch (2007) who found that the h-index series (our type 5, see further) is indeed a good predictor for future scientific achievements.
When we started writing this contribution it became soon clear that in a similar way as for h-indices time series for journal impact factors can be defined. As there already exists a precise notation for all types of impact factors (Frandsen & Rousseau, 2005) we adapt it to the topic studied in this contribution.
The article is organized as follows. The next section explains the adaptation of the notation introduced by Frandsen and Rousseau (2005). Then time series of citation data, based on a publication–citation matrix (in short: p–c matrix) are defined and discussed in Sections 3 Types of time series of citation indicators, 4 Comments on these definitions and some examples. These two sections contain the essential ideas of this article. Sections 5 Hirsch’ elementary baseline model, 6 Time series in the power law model focus on the h-index, presenting different time series of h-indices for two very simple models. We conclude in Section 7.
Section snippets
The adapted Frandsen–Rousseau notation for publication and citation indicator calculations
We assume that the focus is on one set of articles. This set can be a scientist's research record, a journal, as in most examples, but it can also be the set of all journals in one particular field, or even all journals in a database. For this journal or scientist we intend to calculate an impact factor or an h-index (or a similar indicator). It might seem somewhat odd (and in practice not recommended) to calculate a person's impact factor, but, as long as this person publishes at least one
Types of time series of citation indicators
We keep a publication set fixed and study series of citation indicators derived from this set. We define now general time series of indicators and characterize what they say about the set of publications.
Time series are of the form , where k is an index ranging from time (year) 1 to some end time. As we consider several time series they are numbered by a superscript between square brackets. Specifics for each case are shown in Table 1. For a general element of the time
Comments on these definitions and some examples
Time series are used to study the dynamics of citation analysis, revealing trends and fluctuations. They may lead to (careful) predictions. The type 1 series leads to a series of diachronous indicators, making use of all data available in the p–c matrix. If year Y + M − 1 is the latest year for which data are available this is a natural approach, although publication years are treated unevenly. The type 2 series is similar to type 1 but uses cumulative data. In the application for journal impacts
Hirsch’ elementary baseline model
We consider Hirsch’ baseline model in which each year a fixed number of articles, p, is published, and each article receives each year a number of citations equal to c > 0. We consider M citation years and N publication years, N ≤ M. Of course this model is not realistic at all, but it provides a kind of baseline. Moreover, it can be considered an ‘average’ model if we use for p and c the average number of publications and citations received during the period covered by the p–c matrix. A series,
Time series in the power law model
In the power law model the h-index is equal to T1/α where T is the total number of publications under consideration and where the number of articles with t citations is given by a power law of the form C/tα. As in the first model we assume that the number of publications is the same each year and equal to p. No assumptions are made about the number of citations, except that they follow a power law. The exponent of the variable t is in general different in each case and for each element in a
Conclusion
The empirical p–c matrix combined with the adapted F–R notation clearly draws attention to fundamentally different approaches to time series in citation analysis. Time series are used to study the dynamics of citation analysis, expounding trends and fluctuations. They may lead to predictions. We hope that the series we designed will be useful in addressing problems related to the development and characterisation of scientific research, as represented by citation indicators such as impact
Acknowledgements
The authors thank Leo Egghe, Per Ahlgren and Ravichandra Rao for useful suggestions during the preparation of this article. Research of R. Rousseau was supported by the National Natural Science Foundation of China through grant no. 70673019.
References (21)
- et al.
The h-index and career assessment by numbers
Trends in Ecology and Evolution
(2006) - et al.
Beyond the Durfee square: Enhancing the h-index to score total publication output
Scientometrics
(2008) Hirsch index or Hirsch rate? Some thoughts arising from Liang's data
Scientometrics
(2007)- Egghe, L. (2008). Mathematical study of h-index sequences....
- et al.
Fundamental properties of rhythm sequences
Journal of the American Society for Information Science and Technology
(2008) - et al.
An informetric model for the Hirsch-index
Scientometrics
(2006) Journal impact factors do not equitably reflect academic staff performance in different medical subspecialties
Journal of Investigative Medicine
(2004)- et al.
Article impact calculated over arbitrary periods
Journal of the American Society for Information Science and Technology
(2005) The diffusion of scientific publications: the case of Econometrica, 1987
Scientometrics
(2003)Towards a model for diachronous and synchronous citation analyses
Scientometrics
(2004)
Cited by (31)
Information and misinformation in bibliometric time-trend analysis
2018, Journal of InformetricsCitation Excerpt :Synchronous indices use a constant year or set of years (which may mean omitting some known data) whereas diachronous indices use all available data but may then use different sets of citing years for publication years. Frandsen and Rousseau (2005) developed the idea, illustrating this with calculations of article impact over arbitrary periods, and Liu and Rousseau (2008) defined a structured set of time series in citation analysis, illustrating this via the h-index. This paper starts with the synchronous/diachronous disparity and shows the divergent answers that can be produced to the question ‘how does national citation performance change over time?’.
On the time evolution of received citations, in different scientific fields: An empirical study
2014, Journal of InformetricsTwo time series, their meaning and some applications
2013, Journal of InformetricsRatios of h-cores, h-tails and uncited sources in sets of scientific papers and technical patents
2013, Journal of InformetricsCitation Excerpt :Collected data are shown in Appendix A (Table A1). As there are many types of h-index sequences possible, we first note that the h-index sequences calculated in Table 1 are of type II as defined in Liu and Rousseau (2008). On the basis of the above data, we computed the three ratios RH, SH and SZ for the six topics.
Empirical study of the growth dynamics in real career h-index sequences
2011, Journal of InformetricsCitation Excerpt :The main limitation of the work presented here is the relatively small size of the dataset. In the future, besides using a larger dataset, the dynamics of other types of h-index sequences (introduced by Liu and Rousseau (2008) and Egghe (2009a)) could be studied to gain a further insights into the growth dynamics of h-index sequences. JW proposed the classification procedure and wrote the paper.