Disaggregated research evaluation through median-based characteristic scores and scales: a comparison with the mean-based approach

doi:10.1016/j.joi.2017.04.003

Journal of Informetrics

Volume 11, Issue 3, August 2017, Pages 748-765

https://doi.org/10.1016/j.joi.2017.04.003 Get rights and content

Highlights

•
Median-based CSS are discussed as an alternative to mean-based CSS.
•
Under median-based CSS the class of poorly cited papers decreases; the others are enlarged.
•
Moving from mean to median-based CSS leads to a “Matthew effect” at the level of journals.
•
Disaggregated research evaluation is favored against the use of unique aggregated indicators.

Abstract

Characteristic scores and scales (CSS) were proposed in the late 1980s as a powerful tool in evaluative scientometrics but have only recently begun to be used for systematic, multi-level appraisal. By relying on successive sample means found in citation distributions the CSS method yields performance classes that can be used to benchmark individual units of assessment. This article investigates the theoretical and empirical consequences of a median-based approach to the construction of CSS. Mean and median-based CSS algorithms developed in the R language and environment for statistical computing are applied to citation data of papers from journals indexed in four Web of Science categories: Information Science and Library Science, Social work, Microscopy and Thermodynamics. Subject category-level and journal-level comparisons highlight the specificities of the median-based approach relative to the mean-based CSS. When moving from the latter to the former substantially fewer papers are ascribed to the poorly cited CSS class and more papers become fairly, remarkably or outstandingly cited. This transition is also marked by the well-known “Matthew effect” in science. Both CSS versions promote a disaggregated perspective on research evaluation but differ with regard to emphasis: mean-based CSS promote a more exclusive view of excellence; the median-based approach promotes a more inclusive outlook.

Introduction

Research evaluation and the specific instruments used in its service constitute one of the main topics of debate in contemporary academia and in higher education policy. Although motivated by many political, social and economic reasons, the increased attention towards assessing research can be explained by governments’ need to monitor and manage the performance of higher education institutions, by the need to elicit accountability of these institutions to stakeholders and also by the quest to base funding decisions on objective evidence (Penfield et al., 2014). While there is consensus among academics and policy makers on the importance of competitive, high quality research for economic and social prosperity, there is no universally accepted instrument for assessing research performance, quantifying scientific impact or measuring scholarly influence. The lack of a unique answer to such scientometric problems and the idea that divergent, even contradictory evaluations are always possible is a recurrent theme in the recent literature – see for instance Leydesdorff et al. (2016), Waltman et al. (2016) and Abramo and D’Angelo (2015).

On a fundamental level all types of evaluation, including scientometric appraisal and the many indicators in its toolkit, hinge on the idea of aggregation and on the specific form that the aggregative process takes. The way in which specific informational inputs are combined to yield evaluative outcomes is critical. This fact has sparked an already long debate in scientometrics, particularly in the wake of the h-index (Hirsch, 2005) and its many variants like the g-index (Egghe, 2006) which, in essence, only offer an alternative aggregation of the underlying citation data. The moral of the continuing debates surrounding the h-index, of the separate debates around the journal impact factor as well as the moral of the more recent discussions hosted in the pages of this very journal regarding size-independent indicators versus efficiency indicators (Abramo and D’Angelo, 2016) is that in order to be confident in the outcomes of an evaluation it is crucial to be confident in the instrument used to conduct it. As aggregated scientometric indicators have become more important within national assessment processes and international university rankings, their properties, advantages and limitations have attracted increased attention, and the meaningful use of citation data has become a critical issue in research evaluation and in policy decisions (van Raan, 2005).

There seems to be an indisputable consensus in the scientometric community regarding the fact that aggregated indicators are inadequate for the purpose of research evaluation since each indicator, taken separately, can only provide a partial and potentially distorted view of the performance attained by a specific unit of assessment (Hicks et al., 2015, Moed and Halevi, 2015, Van Leeuwen et al., 2003, van Raan, 2006, Vinkler, 2007). This wisdom has been affirmed a fortiori following the introduction of the Hirsch index in 2005 and the wave of Hirsch-type indicators (Bornmann et al., 2011, Schreiber, 2010) that were subsequently proposed as improvements. The overt consensus regarding the rejection of single-number indicators such as the h-index has as its corollary an implicit consensus around a more general principle: when faced with the option between an aggregated approach and a disaggregated approach to research evaluation the latter is to be preferred to the former. In other words, one should use research evaluation instruments that discard as little information as possible and offer a wide and comparatively rich picture of performance.¹ One of the contemporary research evaluation instruments that adhere to these desiderata is given by characteristic scores and scales (CSS) for scientific impact assessment (Glänzel and Schubert, 1988, Schubert et al., 1987) which represent an effort towards achieving a multi-dimensional, disaggregated perspective regarding research performance.

The CSS method was proposed in the late 1980s to assess the eminence of scientific journals on the basis of the citations received by the articles they publish and its cornerstone idea is that of allowing a parameter-free characterization of citation distributions in such a way that impact classes are defined recursively by appealing to successive (arithmetic) means found within a given empirical distribution. The approach is highly relevant to scientometric evaluation because it addresses one of the fundamental problems associated with the adequate statistical treatment of citation data – the skewness of science (Albarrán et al., 2011, Seglen, 1992) which makes analysis through standard statistical practice difficult and potentially biased.

The aim of this article is to explore a theoretically grounded proposal to modify CSS by changing the reference thresholds used in this evaluative instrument from arithmetic means to medians. To the knowledge of the author this possibility has only been noted in a single previous study (Egghe, 2010b) where it received only a formal, theoretical exploration in a continuous Lotkaian framework. All empirical studies devoted to the application of CSS (see Section 2.1) have so far relied on the original mean-based approach. As a result, to date there are neither empirical analyses that leverage median-based CSS, nor factual comparisons of any results produced by this instrument with results relying on the mean-based approach. This article addressed these knowledge gaps by examining both mean and median-based CSS in an application to citation data of journals indexed in the Web of Science categories Information Science and Library Science, Social Work, Microscopy and Thermodynamics. The article also offers a practical implementation of the CSS algorithms in the freely available R language and environment for statistical computing. More generally, the article argues in favor of a disaggregated, inclusive approach to research evaluation and performance assessment.

The article is structured as follows: Section 2 presents the CSS mechanism in more detail, reviews the state of the art with regard to the use of this instrument and puts forward the arguments that justify the need for the alternative, median-based approach; this section also examines the theoretical implications of this shift and provides information on the data used in the empirical investigation together with adjacent methodological notes. Section 3 presents the comparative results of the empirical analyses and highlights the distinctiveness inherent in the application of median-based CSS to citation data. Section 4 summarizes the results and provides a few concluding remarks.

Section snippets

Mean-based CSS in evaluative scientometrics

The fundamental idea of CSS is that of recursively defining certain performance classes for a given empirical distribution of published papers based on the observed number of citations they receive.² Considering a set of n papers published in a particular field of science one starts by sorting in descending order the observed citations ${X_{i}}_{i = 1}^{n}$ received by each paper. An ordered list of the form $X_{1}^{*}$

Citation distributions in the four subject categories

Before discussing the comparative results of mean and median-based CSS in the four subject categories selected for analysis it is useful to first provide some preliminary information regarding the citation patterns within each category. A detailed review of the citation patterns found in the four categories is provided in Table 1 which presents the upper thresholds of the deciles of the distributions of citation counts (considered from lowest to highest values) within each category together

Concluding remarks

Disaggregation-oriented research evaluation instruments such as characteristic scores and scales are a powerful and persuasive alternative to aggregated indicators. They eschew the methodological pitfalls and simplifying tendencies of the latter and preserve a multidimensional view of research performance. Scientometrics should continually develop disaggregation-oriented instruments especially in light of the increased policy use of scientometric data. This article has offered a comparative

Funding

This research was funded by the Research Institute of the University of Bucharest through a grant awarded to the author in December 2016.

Acknowledgements

The author would like to express his gratitude for the valuable comments and suggestions made by the anonymous reviewers and by the Editor-in-Chief of the journal during the review and revision stages. These helped to improve several aspects of the initial manuscript and led to a fruitful expansion of the empirical analysis.

References (51)

G. Abramo et al.
Evaluating university research: Same performance indicator, different rankings
Journal of Informetrics
(2015)
G. Abramo et al.
A farewell to the MNCS and like size-independent indicators
Journal of Informetrics
(2016)
L. Bornmann et al.
A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants
Journal of Informetrics
(2011)
L. Egghe
Characteristic scores and scales based on h-type indices
Journal of Informetrics
(2010)
W. Glänzel
Characteristic scores and scales. A bibliometric analysis of subject characteristics based on long-term citation observation
Journal of Informetrics
(2007)
L. Leydesdorff et al.
Remaining problems with the “New Crown Indicator” (MNCS) of the CWTS
Journal of Informetrics
(2011)
Y. Li et al.
Quantitative evaluation of alternative field normalization procedures
Journal of Informetrics
(2013)
J. Ruiz-Castillo et al.
The skewness of scientific productivity
Journal of Informetrics
(2014)
J. Ruiz-Castillo et al.
Field-normalized citation impact indicators using algorithmically constructed classification systems of science
Journal of Informetrics
(2015)
L. Smolinsky
Expected number of citations and the crown indicator
Journal of Informetrics
(2016)

G.-A. Vîiu

A theoretical evaluation of Hirsch-type bibliometric indicators confronted with extreme self-citation

Journal of Informetrics

(2016)

L. Waltman et al.

Towards a new crown indicator: An empirical analysis

Journal of Informetrics

(2011)

L. Waltman et al.

The elephant in the room: The problem of quantifying productivity in evaluative scientometrics

Journal of Informetrics

(2016)

A. Agresti et al.

Statistical Methods for the Social Sciences

(2009)

P. Albarrán et al.

The skewness of science in 219 sub-fields and a number of aggregates

Scientometrics

(2011)

P. Albarrán et al.

References made and citations received by scientific articles

Journal of the American Society for Information Science and Technology

(2011)

L. Bornmann et al.

Applying the CSS method to bibliometric indicators used in (university) rankings

Scientometrics

(2016)

L. Bornmann et al.

How to evaluate individual researchers working in the natural and life sciences meaningfully?. A proposal of methods based on percentiles of citations

Scientometrics

(2014)

L. Bornmann et al.

Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine

Journal of the American Society for Information Science and Technology

(2008)

M.C. Calver et al.

Should we use the mean citations per paper to summarise a journal's impact or to rank journals in the same field?

Scientometrics

(2009)

R. Costas et al.

A bibliometric classificatory approach for the study and assessment of research performance at the individual level: The effects of age on productivity and impact

Journal of the American Society for Information Science and Technology

(2010)

L. Egghe

Theory and practise of the g -index

Scientometrics

(2006)

L. Egghe

Characteristic scores and scales in a Lotkaian framework

Scientometrics

(2010)

W. Glänzel

The role of the h-index and the characteristic scores and scales in testing the tail properties of scientometric distributions

Scientometrics

(2010)

W. Glänzel

The application of characteristic scores and scales to the evaluation and ranking of scientific journals

Journal of Information Science

(2011)

Cited by (7)

Creativity in science and the link to cited references: Is the creative potential of papers reflected in their cited references?
2018, Journal of Informetrics
Citation Excerpt :
The CSS method has been also applied to citations to compare how altmetric events (Mendeley readership counts, tweets and blog mentions) differ from citing patterns (Costas, Haustein, Zahedi, & Larivière, 2016). Vîiu (2017) writes that the cornerstone of the CSS method “is that of allowing a parameter-free characterization of citation distributions in such a way that impact classes are defined recursively by appealing to successive (arithmetic) means found within a given empirical distribution” (p. 749). Therefore, the approach used in CSS addresses one of the fundamental problems in bibliometrics – the skewness of science (Vîiu, 2017).
Several authors have proposed that a large number of unusual combinations of cited references in a paper point to its high creative potential (or novelty). However, it is still not clear whether the number of unusual combinations can really measure the creative potential of papers. The current study addresses this question on the basis of several case studies from the field of scientometrics. We identified some landmark papers in this field. Study subjects were the corresponding authors of these papers. We asked them where the ideas for the papers came from and which role the cited publications played. The results revealed that the creative ideas might not necessarily have been inspired by past publications. The literature seems to be important for the contextualization of the idea in the field of scientometrics. Instead, we found that creative ideas are the result of finding solutions to practical problems, result from discussions with colleagues, and profit from interdisciplinary exchange. The roots of the studied landmark papers are discussed in detail.
The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics
2018, Journal of Informetrics
Citation Excerpt :
Glänzel (2011) had also reported some nuanced results based on a restricted sample of papers published in 2006 in only three selected fields (with a three year citation window): papers in biophysics/molecular biology closely followed the 70–21–9% pattern but those in applied mathematics showed a 75–18–7% distribution and those in electrical and electronic engineering deviated substantially, having a 63–25–12% configuration. A more recent study also considering a restricted sample of papers published over the 2009–2013 period (Vîiu, 2017) found further support for the more typical 70–21–9% pattern in four Web of Science subject categories. While variations across smaller scale studies are to be expected, a final large scale study that confirms the 70–21–9% pattern deserves separate mention due to its markedly different methodological approach: whereas most of the studies previously mentioned worked within the framework of the predefined Web of Science categories Ruiz-Castillo and Waltman (2015) take a more innovative approach that involves determining scientific fields of variable granularity via algorithmic clustering: based on 3.6 million articles from 2005 to 2008 (a subset arrived at from a more comprehensive pool of about 9.4 million publications from 2003 to 2012) up to 12 distinct classification systems are constructed with between 231 and 11,987 significant clusters (i.e. clusters having at least 100 publications); remarkably, for most of these 12 granularity levels the 70–21–9% pattern is obeyed quite closely, significant departures occurring only in the more fine-grained classifications (granularity levels 9–12) which have a high prevalence of small clusters and where an approximate 67–22–11% pattern seems to prevail.
Characteristic scores and scales (CSS) – a well-established scientometric tool for the study of citation counts – have been used to document a striking phenomenon that characterizes citation distributions at high levels of aggregation: irrespective of scientific field and citation window empirical studies find a persistent pattern whereby about 70% of scientific papers belong to the class of poorly cited papers, about 21% belong to the class of fairly cited papers, 6% to that of remarkably cited papers and 3% to the class of outstandingly cited papers. This article aims to advance the understanding of this remarkable result by examining it in the context of the lognormal distribution, a popular model used to describe citation counts across scientific fields. The article shows that the application of the CSS method to lognormal distributions provides a very good fit to the 70–21–6–3% empirical pattern provided these distributions are characterized by a standard deviation parameter in the range of about 0.8–1.3. The CSS pattern is essentially explainable as an epiphenomenon of the lognormal functional form and, more generally, as a consequence of the skewness of science which is manifest in heavy-tailed citation distributions.
Probability and expected frequency of breakthroughs: basis and use of a robust method of research assessment
2019, Scientometrics
Field normalization of scientometric indicators
2019, Springer Handbooks
Probability and expected frequency of breakthroughs - A robust method of research assessment based on the double rank property of citation distributions.
2018, arXiv
Creativity in science and the link to cited references: Is the creative potential of papers reflected in their cited references?
2018, arXiv

View all citing articles on Scopus

View full text