Disaggregated research evaluation through median-based characteristic scores and scales: a comparison with the mean-based approach
Introduction
Research evaluation and the specific instruments used in its service constitute one of the main topics of debate in contemporary academia and in higher education policy. Although motivated by many political, social and economic reasons, the increased attention towards assessing research can be explained by governments’ need to monitor and manage the performance of higher education institutions, by the need to elicit accountability of these institutions to stakeholders and also by the quest to base funding decisions on objective evidence (Penfield et al., 2014). While there is consensus among academics and policy makers on the importance of competitive, high quality research for economic and social prosperity, there is no universally accepted instrument for assessing research performance, quantifying scientific impact or measuring scholarly influence. The lack of a unique answer to such scientometric problems and the idea that divergent, even contradictory evaluations are always possible is a recurrent theme in the recent literature – see for instance Leydesdorff et al. (2016), Waltman et al. (2016) and Abramo and D’Angelo (2015).
On a fundamental level all types of evaluation, including scientometric appraisal and the many indicators in its toolkit, hinge on the idea of aggregation and on the specific form that the aggregative process takes. The way in which specific informational inputs are combined to yield evaluative outcomes is critical. This fact has sparked an already long debate in scientometrics, particularly in the wake of the h-index (Hirsch, 2005) and its many variants like the g-index (Egghe, 2006) which, in essence, only offer an alternative aggregation of the underlying citation data. The moral of the continuing debates surrounding the h-index, of the separate debates around the journal impact factor as well as the moral of the more recent discussions hosted in the pages of this very journal regarding size-independent indicators versus efficiency indicators (Abramo and D’Angelo, 2016) is that in order to be confident in the outcomes of an evaluation it is crucial to be confident in the instrument used to conduct it. As aggregated scientometric indicators have become more important within national assessment processes and international university rankings, their properties, advantages and limitations have attracted increased attention, and the meaningful use of citation data has become a critical issue in research evaluation and in policy decisions (van Raan, 2005).
There seems to be an indisputable consensus in the scientometric community regarding the fact that aggregated indicators are inadequate for the purpose of research evaluation since each indicator, taken separately, can only provide a partial and potentially distorted view of the performance attained by a specific unit of assessment (Hicks et al., 2015, Moed and Halevi, 2015, Van Leeuwen et al., 2003, van Raan, 2006, Vinkler, 2007). This wisdom has been affirmed a fortiori following the introduction of the Hirsch index in 2005 and the wave of Hirsch-type indicators (Bornmann et al., 2011, Schreiber, 2010) that were subsequently proposed as improvements. The overt consensus regarding the rejection of single-number indicators such as the h-index has as its corollary an implicit consensus around a more general principle: when faced with the option between an aggregated approach and a disaggregated approach to research evaluation the latter is to be preferred to the former. In other words, one should use research evaluation instruments that discard as little information as possible and offer a wide and comparatively rich picture of performance.1 One of the contemporary research evaluation instruments that adhere to these desiderata is given by characteristic scores and scales (CSS) for scientific impact assessment (Glänzel and Schubert, 1988, Schubert et al., 1987) which represent an effort towards achieving a multi-dimensional, disaggregated perspective regarding research performance.
The CSS method was proposed in the late 1980s to assess the eminence of scientific journals on the basis of the citations received by the articles they publish and its cornerstone idea is that of allowing a parameter-free characterization of citation distributions in such a way that impact classes are defined recursively by appealing to successive (arithmetic) means found within a given empirical distribution. The approach is highly relevant to scientometric evaluation because it addresses one of the fundamental problems associated with the adequate statistical treatment of citation data – the skewness of science (Albarrán et al., 2011, Seglen, 1992) which makes analysis through standard statistical practice difficult and potentially biased.
The aim of this article is to explore a theoretically grounded proposal to modify CSS by changing the reference thresholds used in this evaluative instrument from arithmetic means to medians. To the knowledge of the author this possibility has only been noted in a single previous study (Egghe, 2010b) where it received only a formal, theoretical exploration in a continuous Lotkaian framework. All empirical studies devoted to the application of CSS (see Section 2.1) have so far relied on the original mean-based approach. As a result, to date there are neither empirical analyses that leverage median-based CSS, nor factual comparisons of any results produced by this instrument with results relying on the mean-based approach. This article addressed these knowledge gaps by examining both mean and median-based CSS in an application to citation data of journals indexed in the Web of Science categories Information Science and Library Science, Social Work, Microscopy and Thermodynamics. The article also offers a practical implementation of the CSS algorithms in the freely available R language and environment for statistical computing. More generally, the article argues in favor of a disaggregated, inclusive approach to research evaluation and performance assessment.
The article is structured as follows: Section 2 presents the CSS mechanism in more detail, reviews the state of the art with regard to the use of this instrument and puts forward the arguments that justify the need for the alternative, median-based approach; this section also examines the theoretical implications of this shift and provides information on the data used in the empirical investigation together with adjacent methodological notes. Section 3 presents the comparative results of the empirical analyses and highlights the distinctiveness inherent in the application of median-based CSS to citation data. Section 4 summarizes the results and provides a few concluding remarks.
Section snippets
Mean-based CSS in evaluative scientometrics
The fundamental idea of CSS is that of recursively defining certain performance classes for a given empirical distribution of published papers based on the observed number of citations they receive.2 Considering a set of n papers published in a particular field of science one starts by sorting in descending order the observed citations received by each paper. An ordered list of the form
Citation distributions in the four subject categories
Before discussing the comparative results of mean and median-based CSS in the four subject categories selected for analysis it is useful to first provide some preliminary information regarding the citation patterns within each category. A detailed review of the citation patterns found in the four categories is provided in Table 1 which presents the upper thresholds of the deciles of the distributions of citation counts (considered from lowest to highest values) within each category together
Concluding remarks
Disaggregation-oriented research evaluation instruments such as characteristic scores and scales are a powerful and persuasive alternative to aggregated indicators. They eschew the methodological pitfalls and simplifying tendencies of the latter and preserve a multidimensional view of research performance. Scientometrics should continually develop disaggregation-oriented instruments especially in light of the increased policy use of scientometric data. This article has offered a comparative
Funding
This research was funded by the Research Institute of the University of Bucharest through a grant awarded to the author in December 2016.
Acknowledgements
The author would like to express his gratitude for the valuable comments and suggestions made by the anonymous reviewers and by the Editor-in-Chief of the journal during the review and revision stages. These helped to improve several aspects of the initial manuscript and led to a fruitful expansion of the empirical analysis.
References (51)
- et al.
Evaluating university research: Same performance indicator, different rankings
Journal of Informetrics
(2015) - et al.
A farewell to the MNCS and like size-independent indicators
Journal of Informetrics
(2016) - et al.
A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants
Journal of Informetrics
(2011) Characteristic scores and scales based on h-type indices
Journal of Informetrics
(2010)Characteristic scores and scales. A bibliometric analysis of subject characteristics based on long-term citation observation
Journal of Informetrics
(2007)- et al.
Remaining problems with the “New Crown Indicator” (MNCS) of the CWTS
Journal of Informetrics
(2011) - et al.
Quantitative evaluation of alternative field normalization procedures
Journal of Informetrics
(2013) - et al.
The skewness of scientific productivity
Journal of Informetrics
(2014) - et al.
Field-normalized citation impact indicators using algorithmically constructed classification systems of science
Journal of Informetrics
(2015) Expected number of citations and the crown indicator
Journal of Informetrics
(2016)
A theoretical evaluation of Hirsch-type bibliometric indicators confronted with extreme self-citation
Journal of Informetrics
Towards a new crown indicator: An empirical analysis
Journal of Informetrics
The elephant in the room: The problem of quantifying productivity in evaluative scientometrics
Journal of Informetrics
Statistical Methods for the Social Sciences
The skewness of science in 219 sub-fields and a number of aggregates
Scientometrics
References made and citations received by scientific articles
Journal of the American Society for Information Science and Technology
Applying the CSS method to bibliometric indicators used in (university) rankings
Scientometrics
How to evaluate individual researchers working in the natural and life sciences meaningfully?. A proposal of methods based on percentiles of citations
Scientometrics
Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine
Journal of the American Society for Information Science and Technology
Should we use the mean citations per paper to summarise a journal's impact or to rank journals in the same field?
Scientometrics
A bibliometric classificatory approach for the study and assessment of research performance at the individual level: The effects of age on productivity and impact
Journal of the American Society for Information Science and Technology
Theory and practise of the g -index
Scientometrics
Characteristic scores and scales in a Lotkaian framework
Scientometrics
The role of the h-index and the characteristic scores and scales in testing the tail properties of scientometric distributions
Scientometrics
The application of characteristic scores and scales to the evaluation and ranking of scientific journals
Journal of Information Science
Cited by (7)
Creativity in science and the link to cited references: Is the creative potential of papers reflected in their cited references?
2018, Journal of InformetricsCitation Excerpt :The CSS method has been also applied to citations to compare how altmetric events (Mendeley readership counts, tweets and blog mentions) differ from citing patterns (Costas, Haustein, Zahedi, & Larivière, 2016). Vîiu (2017) writes that the cornerstone of the CSS method “is that of allowing a parameter-free characterization of citation distributions in such a way that impact classes are defined recursively by appealing to successive (arithmetic) means found within a given empirical distribution” (p. 749). Therefore, the approach used in CSS addresses one of the fundamental problems in bibliometrics – the skewness of science (Vîiu, 2017).
The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics
2018, Journal of InformetricsCitation Excerpt :Glänzel (2011) had also reported some nuanced results based on a restricted sample of papers published in 2006 in only three selected fields (with a three year citation window): papers in biophysics/molecular biology closely followed the 70–21–9% pattern but those in applied mathematics showed a 75–18–7% distribution and those in electrical and electronic engineering deviated substantially, having a 63–25–12% configuration. A more recent study also considering a restricted sample of papers published over the 2009–2013 period (Vîiu, 2017) found further support for the more typical 70–21–9% pattern in four Web of Science subject categories. While variations across smaller scale studies are to be expected, a final large scale study that confirms the 70–21–9% pattern deserves separate mention due to its markedly different methodological approach: whereas most of the studies previously mentioned worked within the framework of the predefined Web of Science categories Ruiz-Castillo and Waltman (2015) take a more innovative approach that involves determining scientific fields of variable granularity via algorithmic clustering: based on 3.6 million articles from 2005 to 2008 (a subset arrived at from a more comprehensive pool of about 9.4 million publications from 2003 to 2012) up to 12 distinct classification systems are constructed with between 231 and 11,987 significant clusters (i.e. clusters having at least 100 publications); remarkably, for most of these 12 granularity levels the 70–21–9% pattern is obeyed quite closely, significant departures occurring only in the more fine-grained classifications (granularity levels 9–12) which have a high prevalence of small clusters and where an approximate 67–22–11% pattern seems to prevail.
Field normalization of scientometric indicators
2019, Springer Handbooks