Elsevier

Journal of Informetrics

Volume 11, Issue 3, August 2017, Pages 748-765
Journal of Informetrics

Disaggregated research evaluation through median-based characteristic scores and scales: a comparison with the mean-based approach

https://doi.org/10.1016/j.joi.2017.04.003Get rights and content

Highlights

  • Median-based CSS are discussed as an alternative to mean-based CSS.

  • Under median-based CSS the class of poorly cited papers decreases; the others are enlarged.

  • Moving from mean to median-based CSS leads to a “Matthew effect” at the level of journals.

  • Disaggregated research evaluation is favored against the use of unique aggregated indicators.

Abstract

Characteristic scores and scales (CSS) were proposed in the late 1980s as a powerful tool in evaluative scientometrics but have only recently begun to be used for systematic, multi-level appraisal. By relying on successive sample means found in citation distributions the CSS method yields performance classes that can be used to benchmark individual units of assessment. This article investigates the theoretical and empirical consequences of a median-based approach to the construction of CSS. Mean and median-based CSS algorithms developed in the R language and environment for statistical computing are applied to citation data of papers from journals indexed in four Web of Science categories: Information Science and Library Science, Social work, Microscopy and Thermodynamics. Subject category-level and journal-level comparisons highlight the specificities of the median-based approach relative to the mean-based CSS. When moving from the latter to the former substantially fewer papers are ascribed to the poorly cited CSS class and more papers become fairly, remarkably or outstandingly cited. This transition is also marked by the well-known “Matthew effect” in science. Both CSS versions promote a disaggregated perspective on research evaluation but differ with regard to emphasis: mean-based CSS promote a more exclusive view of excellence; the median-based approach promotes a more inclusive outlook.

Introduction

Research evaluation and the specific instruments used in its service constitute one of the main topics of debate in contemporary academia and in higher education policy. Although motivated by many political, social and economic reasons, the increased attention towards assessing research can be explained by governments’ need to monitor and manage the performance of higher education institutions, by the need to elicit accountability of these institutions to stakeholders and also by the quest to base funding decisions on objective evidence (Penfield et al., 2014). While there is consensus among academics and policy makers on the importance of competitive, high quality research for economic and social prosperity, there is no universally accepted instrument for assessing research performance, quantifying scientific impact or measuring scholarly influence. The lack of a unique answer to such scientometric problems and the idea that divergent, even contradictory evaluations are always possible is a recurrent theme in the recent literature – see for instance Leydesdorff et al. (2016), Waltman et al. (2016) and Abramo and D’Angelo (2015).

On a fundamental level all types of evaluation, including scientometric appraisal and the many indicators in its toolkit, hinge on the idea of aggregation and on the specific form that the aggregative process takes. The way in which specific informational inputs are combined to yield evaluative outcomes is critical. This fact has sparked an already long debate in scientometrics, particularly in the wake of the h-index (Hirsch, 2005) and its many variants like the g-index (Egghe, 2006) which, in essence, only offer an alternative aggregation of the underlying citation data. The moral of the continuing debates surrounding the h-index, of the separate debates around the journal impact factor as well as the moral of the more recent discussions hosted in the pages of this very journal regarding size-independent indicators versus efficiency indicators (Abramo and D’Angelo, 2016) is that in order to be confident in the outcomes of an evaluation it is crucial to be confident in the instrument used to conduct it. As aggregated scientometric indicators have become more important within national assessment processes and international university rankings, their properties, advantages and limitations have attracted increased attention, and the meaningful use of citation data has become a critical issue in research evaluation and in policy decisions (van Raan, 2005).

There seems to be an indisputable consensus in the scientometric community regarding the fact that aggregated indicators are inadequate for the purpose of research evaluation since each indicator, taken separately, can only provide a partial and potentially distorted view of the performance attained by a specific unit of assessment (Hicks et al., 2015, Moed and Halevi, 2015, Van Leeuwen et al., 2003, van Raan, 2006, Vinkler, 2007). This wisdom has been affirmed a fortiori following the introduction of the Hirsch index in 2005 and the wave of Hirsch-type indicators (Bornmann et al., 2011, Schreiber, 2010) that were subsequently proposed as improvements. The overt consensus regarding the rejection of single-number indicators such as the h-index has as its corollary an implicit consensus around a more general principle: when faced with the option between an aggregated approach and a disaggregated approach to research evaluation the latter is to be preferred to the former. In other words, one should use research evaluation instruments that discard as little information as possible and offer a wide and comparatively rich picture of performance.1 One of the contemporary research evaluation instruments that adhere to these desiderata is given by characteristic scores and scales (CSS) for scientific impact assessment (Glänzel and Schubert, 1988, Schubert et al., 1987) which represent an effort towards achieving a multi-dimensional, disaggregated perspective regarding research performance.

The CSS method was proposed in the late 1980s to assess the eminence of scientific journals on the basis of the citations received by the articles they publish and its cornerstone idea is that of allowing a parameter-free characterization of citation distributions in such a way that impact classes are defined recursively by appealing to successive (arithmetic) means found within a given empirical distribution. The approach is highly relevant to scientometric evaluation because it addresses one of the fundamental problems associated with the adequate statistical treatment of citation data – the skewness of science (Albarrán et al., 2011, Seglen, 1992) which makes analysis through standard statistical practice difficult and potentially biased.

The aim of this article is to explore a theoretically grounded proposal to modify CSS by changing the reference thresholds used in this evaluative instrument from arithmetic means to medians. To the knowledge of the author this possibility has only been noted in a single previous study (Egghe, 2010b) where it received only a formal, theoretical exploration in a continuous Lotkaian framework. All empirical studies devoted to the application of CSS (see Section 2.1) have so far relied on the original mean-based approach. As a result, to date there are neither empirical analyses that leverage median-based CSS, nor factual comparisons of any results produced by this instrument with results relying on the mean-based approach. This article addressed these knowledge gaps by examining both mean and median-based CSS in an application to citation data of journals indexed in the Web of Science categories Information Science and Library Science, Social Work, Microscopy and Thermodynamics. The article also offers a practical implementation of the CSS algorithms in the freely available R language and environment for statistical computing. More generally, the article argues in favor of a disaggregated, inclusive approach to research evaluation and performance assessment.

The article is structured as follows: Section 2 presents the CSS mechanism in more detail, reviews the state of the art with regard to the use of this instrument and puts forward the arguments that justify the need for the alternative, median-based approach; this section also examines the theoretical implications of this shift and provides information on the data used in the empirical investigation together with adjacent methodological notes. Section 3 presents the comparative results of the empirical analyses and highlights the distinctiveness inherent in the application of median-based CSS to citation data. Section 4 summarizes the results and provides a few concluding remarks.

Section snippets

Mean-based CSS in evaluative scientometrics

The fundamental idea of CSS is that of recursively defining certain performance classes for a given empirical distribution of published papers based on the observed number of citations they receive.2 Considering a set of n papers published in a particular field of science one starts by sorting in descending order the observed citations {Xi}i=1n received by each paper. An ordered list of the form X1*

Citation distributions in the four subject categories

Before discussing the comparative results of mean and median-based CSS in the four subject categories selected for analysis it is useful to first provide some preliminary information regarding the citation patterns within each category. A detailed review of the citation patterns found in the four categories is provided in Table 1 which presents the upper thresholds of the deciles of the distributions of citation counts (considered from lowest to highest values) within each category together

Concluding remarks

Disaggregation-oriented research evaluation instruments such as characteristic scores and scales are a powerful and persuasive alternative to aggregated indicators. They eschew the methodological pitfalls and simplifying tendencies of the latter and preserve a multidimensional view of research performance. Scientometrics should continually develop disaggregation-oriented instruments especially in light of the increased policy use of scientometric data. This article has offered a comparative

Funding

This research was funded by the Research Institute of the University of Bucharest through a grant awarded to the author in December 2016.

Acknowledgements

The author would like to express his gratitude for the valuable comments and suggestions made by the anonymous reviewers and by the Editor-in-Chief of the journal during the review and revision stages. These helped to improve several aspects of the initial manuscript and led to a fruitful expansion of the empirical analysis.

References (51)

  • G.-A. Vîiu

    A theoretical evaluation of Hirsch-type bibliometric indicators confronted with extreme self-citation

    Journal of Informetrics

    (2016)
  • L. Waltman et al.

    Towards a new crown indicator: An empirical analysis

    Journal of Informetrics

    (2011)
  • L. Waltman et al.

    The elephant in the room: The problem of quantifying productivity in evaluative scientometrics

    Journal of Informetrics

    (2016)
  • A. Agresti et al.

    Statistical Methods for the Social Sciences

    (2009)
  • P. Albarrán et al.

    The skewness of science in 219 sub-fields and a number of aggregates

    Scientometrics

    (2011)
  • P. Albarrán et al.

    References made and citations received by scientific articles

    Journal of the American Society for Information Science and Technology

    (2011)
  • L. Bornmann et al.

    Applying the CSS method to bibliometric indicators used in (university) rankings

    Scientometrics

    (2016)
  • L. Bornmann et al.

    How to evaluate individual researchers working in the natural and life sciences meaningfully?. A proposal of methods based on percentiles of citations

    Scientometrics

    (2014)
  • L. Bornmann et al.

    Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine

    Journal of the American Society for Information Science and Technology

    (2008)
  • M.C. Calver et al.

    Should we use the mean citations per paper to summarise a journal's impact or to rank journals in the same field?

    Scientometrics

    (2009)
  • R. Costas et al.

    A bibliometric classificatory approach for the study and assessment of research performance at the individual level: The effects of age on productivity and impact

    Journal of the American Society for Information Science and Technology

    (2010)
  • L. Egghe

    Theory and practise of the g -index

    Scientometrics

    (2006)
  • L. Egghe

    Characteristic scores and scales in a Lotkaian framework

    Scientometrics

    (2010)
  • W. Glänzel

    The role of the h-index and the characteristic scores and scales in testing the tail properties of scientometric distributions

    Scientometrics

    (2010)
  • W. Glänzel

    The application of characteristic scores and scales to the evaluation and ranking of scientific journals

    Journal of Information Science

    (2011)
  • Cited by (7)

    • Creativity in science and the link to cited references: Is the creative potential of papers reflected in their cited references?

      2018, Journal of Informetrics
      Citation Excerpt :

      The CSS method has been also applied to citations to compare how altmetric events (Mendeley readership counts, tweets and blog mentions) differ from citing patterns (Costas, Haustein, Zahedi, & Larivière, 2016). Vîiu (2017) writes that the cornerstone of the CSS method “is that of allowing a parameter-free characterization of citation distributions in such a way that impact classes are defined recursively by appealing to successive (arithmetic) means found within a given empirical distribution” (p. 749). Therefore, the approach used in CSS addresses one of the fundamental problems in bibliometrics – the skewness of science (Vîiu, 2017).

    • The lognormal distribution explains the remarkable pattern documented by characteristic scores and scales in scientometrics

      2018, Journal of Informetrics
      Citation Excerpt :

      Glänzel (2011) had also reported some nuanced results based on a restricted sample of papers published in 2006 in only three selected fields (with a three year citation window): papers in biophysics/molecular biology closely followed the 70–21–9% pattern but those in applied mathematics showed a 75–18–7% distribution and those in electrical and electronic engineering deviated substantially, having a 63–25–12% configuration. A more recent study also considering a restricted sample of papers published over the 2009–2013 period (Vîiu, 2017) found further support for the more typical 70–21–9% pattern in four Web of Science subject categories. While variations across smaller scale studies are to be expected, a final large scale study that confirms the 70–21–9% pattern deserves separate mention due to its markedly different methodological approach: whereas most of the studies previously mentioned worked within the framework of the predefined Web of Science categories Ruiz-Castillo and Waltman (2015) take a more innovative approach that involves determining scientific fields of variable granularity via algorithmic clustering: based on 3.6 million articles from 2005 to 2008 (a subset arrived at from a more comprehensive pool of about 9.4 million publications from 2003 to 2012) up to 12 distinct classification systems are constructed with between 231 and 11,987 significant clusters (i.e. clusters having at least 100 publications); remarkably, for most of these 12 granularity levels the 70–21–9% pattern is obeyed quite closely, significant departures occurring only in the more fine-grained classifications (granularity levels 9–12) which have a high prevalence of small clusters and where an approximate 67–22–11% pattern seems to prevail.

    View all citing articles on Scopus
    View full text