On reliability and robustness of scientometrics indicators based on stochastic models. An evidence-based opinion paper☆
Introduction
The terms scientometrics, bibliometrics and informetrics are usually explained as the application of mathematical and statistical methods to information and communication processes in different contexts (cf., Gorkova, 1988, Nacke, 1979, Nalimov and Mulchenko, 1969, Pritchard, 1969, Tague-Sutcliffe, 1992). The creation and application of mathematical, notably stochastic models to scientometrics and related fields seems therefore quite obvious. In particular, the links created by co-authorship relations, by received and given citations form complex networks of scientific communication which can best be described and analysed with the help of mathematical tools.
The application of stochastic models and probability distributions has the following important advantages (Glänzel, 2008).
- 1.
It provides mathematical interpretations beside the scientometric ones. This means a more general notion of phenomena with the opportunity of extensions and generalisations through the choice of appropriate models, even beyond our field. Mathematical meaning and interpretation of scientometric measures can be given by parameters and statistical functions.
- 2.
It helps understand complex structures such as communication networks. Although deterministic network models also allow randomness, the use of probabilistic network models such as Bayesian networks opens new perspectives, above all, concerning inference and learning.
- 3.
It provides information about statistical reliability, random errors and confidence intervals for indicators.
- 4.
It allows predictions concerning the expectation and probability of future events.
Scientometric phenomena to be quantified and measured can often be expressed by non-negative integer- or real-valued random variables. Most scientometric indicators can thus be defined as statistical functions such as mean values, quantiles, relative frequencies, rank statistics, Hirsch-type statistics. Therefore, we will focus on the last two issues in the above list since the other two questions are too general to be discussed here.
Section snippets
Reliability and robustness of scientometric indicators based on stochastic models
Deterministic as well as probabilistic models are used to describe patterns and processes of scholarly communication. First, in its pioneering days, scientometrics has adopted laws and models from other, often not even related fields (e.g., the model of radioactive decay for obsolescence of literature by Gosnell, 1944, models from quantitative linguistics for bibliometric rank frequencies by Zipf, 1949, or the theory of intellectual epidemics as a model of scientific communication by Goffman &
Conclusions
In its pioneering days, scientometrics has adopted laws and models from other, often not even related fields to describe observed phenomena. The exponential and logistic growth model to describe the growth of literature, the model of radioactive decay for the ageing of information or epidemic models for dissemination of information may just serve as an example. These models have been supplemented by generic scientometric approaches but most of them remained deterministic ones (e.g., Lotka's and
Acknowledgement
Herewith I would like to thank Ronald Rousseau for valuable comments on a previous version of this paper.
References (34)
- et al.
Theory of first-citation distributions and applications
Mathematical and Computer Modelling
(2001) On some stopping times on citation processes. From theory to indicators
Information Processing & Management
(1992)- et al.
Predictive aspects of a stochastic model for citation processes
Information Processing & Management
(1995) - et al.
A systematic analysis of Hirsch-type indices for journals
Journal of Informetrics
(2007) An introduction to informetrics
Information Processing & Management
(1992)- et al.
Paretian publication patterns imply Paretian Hirsch index
Scientometrics
(2009) - Beirlant, J., Einmahl, J. H. J. (2007). Asymptotics for the Hirsch Index. CentER Discussion Paper #2007-86. Accessible...
Informetric distributions. 3. Ambiguity and randomness
Journal of the American Society for Information Science
(1997)Stochastic modelling of the first-citation distribution
Scientometrics
(2001)The nth-citation distribution and obsolescence
Scientometrics
(2002)
Predicting future citation behavior
Journal of the American Society for Information Science and Technology
The use of the generalized Waring process in modelling informetric data
Scientometrics
On the reliability of predictions based on stochastic citation processes
Scientometrics
Bibliometrics as a research field. Course script
The role of the h-index and the characteristic scores and scales in testing the tail properties of scientometric distributions
A Stochastic Model for the ageing analyses of scientific literature
Scientometrics
Cited by (34)
The bibliometric quotient (BQ), or how to measure a researcher's performance capacity: A Bayesian Poisson Rasch model
2018, Journal of InformetricsCitation Excerpt :The h-index combines citations and publications of an author in one indicator. Indicators have many advantages: They are easy to calculate, are intuitively understandable through the mathematical algorithm, and can usually be presented at many levels (individual researchers, research groups, journals, …), such as the h-index for journals (Glänzel, 2010; Waltman, 2016). Although the indicator approach works more or less successfully at the level of institutions, such as with the Leiden Ranking (Mutz & Daniel, 2015), problems can arise regarding the assessment of individual researchers.
Are there any frontiers of research performance? Efficiency measurement of funded research projects with the Bayesian stochastic frontier analysis for count data
2017, Journal of InformetricsCitation Excerpt :This raises explicitly the questions of the effectiveness and efficiency of research funding (Hicks, 2012; Rabovsky, 2014a, 2014b). The use of nonparametric methods of productivity and efficiency analysis like DEA requires deterministic indicators (Glänzel, 2010, p. 314). Any kind of random noise or stochastic component is not considered.
A critical cluster analysis of 44 indicators of author-level performance
2016, Journal of InformetricsCitation Excerpt :Thus, bibliometric evaluation of the individual remains framed by culturally influenced norms, disciplinary norms, and “ways of knowing” in the individual’s specialty, which is also affected by the individual’s visibility or coverage in generic citation databases Harzing and Alakangas, 2016. A prerequisite of informed bibliometric evaluation is that assessors understand the mathematical construction of the indicator and understand how well the mathematical model fits the data used to compute the indicator (Glänzel, 2010). This in turn improves understanding of how the indicator on a particular individual’s publication/citation dataset serves as an asset or drawback in summarizing numerically the experiences and achievements of the researcher (Abramo & D’Angelo, 2011; Sandström & Sandström, 2009).
Does the specification of uncertainty hurt the progress of scientometrics?
2013, Journal of InformetricsStatistical inference on the h-index with an application to top-scientist performance
2012, Journal of InformetricsCitation Excerpt :Therefore, S(x) constitutes the probability that a paper of the scholar receives more than x citations. The random variable X is usually required to be “heavy-tailed” in the scientometric applications (see e.g. Glänzel (2006, 2010)), even if the results given in this section hold in general. Hence, if the scholar has published n papers, the random variables X1, …, Xn represent the citation counts for his/her n papers.
An extension of the h index that covers the tail and the top of the citation curve and allows ranking researchers with similar h
2012, Journal of InformetricsCitation Excerpt :In any case, scalar indicators used in isolation yield a unique and unequivocal ranking but multiple or multidimensional indicators demand additional criteria or combination rules to yield a ranking. Glänzel (2010) discussed rankings based on a general class of unidimensional composite indicators obtained as weighted sums of individual indicators. The weights in these linear combinations reflect the relative importance assigned to each of the individual indicators and, thus, the decision-makers’ choice as to which individual indicator should dominate the composite criterion for ranking.
- ☆
This paper is based on invited lectures delivered at the Workshop “Modelling science—Understanding, forecasting, and communicating the science system” held in Amsterdam, the Netherlands, on 6–9 October, 2009 and the “Fifth International Conference on Webometrics, Informetrics, and Scientometrics & Tenth COLLNET Meeting” held in Dalian, China, on 13–16 September, 2009.