Regular article
Improving the reliability of short-term citation impact indicators by taking into account the correlation between short- and long-term citation impact

https://doi.org/10.1016/j.joi.2020.101019Get rights and content

Highlights

  • The normalized indicator may not be reliable when a short citation window is used.

  • To solve this problem, we introduce a weighting factor to the normalized indicator.

  • The weight reflects the degree of reliability of the normalized indicator.

  • The weighted indicator overcomes the disadvantage of the new crown indicator.

  • After introducing the weight, some universities’ rankings changed dramatically.

Abstract

The normalized citation indicator may not be sufficiently reliable when a short citation time window is used, because the citation counts for recently published papers are not as reliable as those for papers published many years ago. In a limited time period, recent publications usually have insufficient time to accumulate citations and the citation counts of these publications are not sufficiently reliable to be used in the citation impact indicators. However, normalization methods themselves cannot solve this problem. To solve this problem, we introduce a weighting factor to the commonly used normalization indicator Category Normalized Citation Impact (CNCI) at the paper level. The weighting factor, which is calculated as the correlation coefficient between citation counts of papers in the given short citation window and those in the fixed long citation window, reflects the degree of reliability of the CNCI value of one paper. To verify the effect of the proposed weighted CNCI indicator, we compared the CNCI score and CNCI ranking of 500 universities before and after introducing the weighting factor. The results showed that although there was a strong positive correlation before and after the introduction of the weighting factor, some universities’ performance and rankings changed dramatically.

Introduction

The citation count is an important and commonly used indicator to measure the impact of papers in research evaluation. However, owing to the differences in citation practices among different fields, the citation count of papers cannot be compared directly across fields. For example, citation counts for papers in biology and biomedicine are in general significantly higher than those in mathematics. The Essential Science Indicators (ESI) updated in January 2019 reported that the average number of citations of the ten most cited papers in Biology and Biochemistry was 9,156, whereas it was only 1413 in Mathematics (Clarivate Analytics, 2019). Taking this into account, citation-based field normalized indicators should be used to make a fair comparison of papers across fields, which is especially important in the practice of research evaluation.

A field normalization method is a mathematical transformation that attempts to eliminate field differences in citation counts. Through the normalization, the field differences in citation counts should be eliminated, so that the comparison of the citation impact across fields can be naturally implemented through the comparison of normalized citation counts.1

Many normalization methods have been proposed, such as the mean-based method (Lundberg, 2007; Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011b), z-score method (Vaccario, Medo, Wider, & Mariani, 2017), percentile rank method (Bornmann, Leydesdorff, & Wang, 2013; Leydesdorff & Bornmann, 2011), reverse engineering method (Radicchi & Castellano, 2012), and citing side method (Leydesdorff & Opthof, 2010; Waltman & van Eck, 2013; Zitt & Small, 2008). These methods or indicators have been applied to the research evaluation of individuals (Bouyssou & Marchant, 2016; Leydesdorff & Bornmann, 2011), institutions (Ahlgren, Yue, Rousseau, & Yang, 2017; Franceschini, Maisano, & Mastrogiacomo, 2013; Prathap, 2014; Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011a), countries (Adams, 2018; Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011a), and journals (Moed, 2010; Waltman, van Eck, van Leeuwen, & Visser, 2013). However, these normalization indicators are not free of limitations.

One concern is the reliability of field normalized indicators when a short citation time window is used. The normalized citation indicator may not be sufficiently reliable when a short citation time window (e.g., 2 years) is used, because the citation counts for recently published papers are not as reliable as those for papers published many years ago. Such a limited period of time is usually insufficient to allow publications to accumulate citations until the number stabilizes.2 In other words, the reliability of citations is related to the length of the citation time window. However, normalization methods themselves cannot solve this problem (Waltman et al., 2011a; Wang, 2013).

Many previous studies indicated that using short-term citation impact indicators for research evaluation is not reliable.3 For example, Wang (2013) found that the correlation coefficient between the citation counts in a two-year citation window and those in the more reliable citation window of 31 years was only 0.592 for all fields, and the correlation coefficients were even lower in the fields of Engineering Technology and Mathematics, dropping to 0.466 and 0.386, respectively. The results of Nederhof, van Leeuwen, and Clancy (2012) showed that using a short-term citation time window (1–4 years) is detrimental for impact assessment of research in the field of space life and physical sciences in comparison with ground-based life and physical sciences. They argued that these publications from the space life and physical sciences require a longer citation time window, which would yield more reliable results than for a shorter citation time window. Wang, Song, and Barabasi (2013) found that a group of papers that collect the same number of citations within a five-year span had widely different long-term impacts. They also found that the correlation between early- and long-term citations of the discoveries with the most long-term citations was low. Abramo, D’Angelo, and Felici (2019) showed that the long-term impact of publications with a small number of early citations cannot be predicted with the same accuracy as for those with a large number of early citations. They thus pointed out that papers with zero or a small number of early citations should be given special attention when conducting research evaluation, especially considering these papers’ substantial shares.

Simultaneously, using a short citation time window to evaluate the recent research performance of journals, institutions, and countries is usually unavoidable in actual research evaluation practice (e.g., annually released university rankings and journal rankings usually evaluate the recent published papers’ impact performance of universities and journals). The stakeholders in research evaluation (e.g., scientific policy makers or decision makers) usually cannot wait for decades to evaluate the research impact of publications in actual research evaluation practice. Therefore, it is obvious that there is a limitation in existing normalized citation methods if the citation time window is not taken into account.

To solve the problem mentioned above, we propose the following solution:

When a short citation time window is used, the normalized citation count should be weighted by a factor that indicates its degree of reliability. The reliability factor can be calculated as the correlation coefficient between the citation counts in the short time window and that in a fixed reliable long citation time window (e.g., 31 years). For example, if the correlation coefficient between the citation counts of papers published 2 years ago and 31 years ago in chemistry is 0.55, the normalized citation count of a chemistry paper with a citation window of 2 years should be multiplied by 0.55 to obtain its reliable scientific impact. The shorter (longer) the time window after publication, the lower (higher) the correlation coefficient and degree of reliability.

In this study, we first summarize the correlation coefficients in various fields between the citation counts in each time window of 1–10 years and the citation counts in the reliable long time window of 31 years. These correlation coefficients were subsequently introduced as weighting factors to weight the commonly used mean-based normalization indicator Category Normalized Citation Impact (CNCI) at the paper level, which is embedded in InCites, a comprehensive research evaluation platform developed by Clarivate Analytics. The effect of the weighted CNCI indicator we proposed needs to be verified and explored at the aggregation levels, such as at the level of individual researchers, the level of research groups, the level of institutions (including universities), the level of countries, and the level of journals. Because of the limitations in terms of the length of the paper, we only studied the effect of the weighted CNCI indicator at the university level with which we are relatively familiar. Taking the top 500 universities in Shanghai Ranking as an example, we investigated the effect of the weighting from the perspective of the changes of scientific impact at the university level. Fig. 1 shows our research ideas in a schematic representation.

Many studies used some predictive variables to predict the long-term impact of a paper. For example, Wang et al. (2013) established a mechanistic model which includes three fundamental mechanisms, “Preferential attachment,” “Aging,” and “Fitness,” to quantify the long-term impact of a paper. Abramo et al. (2019) examined the predictive accuracy of the long-term impact of a paper by combining of the early citations of the paper and the impact factor of the hosting journal. Bornmann, Leydesdorff, and Wang (2014) explored the possibility of improving the prediction of long-term citation impact based on the early citations, journal impact factor, and other variables such as the number of authors, the number of references, and the number of pages. At present, the predictive variables most commonly used in current existing literature is the early citations of a paper (Abrishami & Aliakbary, 2019) or a combination of the early citations of the paper and the impact factor of the hosting journal (Abramo et al., 2019; Levitt & Thelwall, 2011; Stegehuis, Litvak, & Waltman, 2015; Stern, 2014). Furthermore, in addition to the early citations of the paper and the journal impact factor, some studies added other predictive variables to the prediction model of the long-term citation impact of a paper. According to the classification frame proposed by Tahamtan, Afshar, and Ahamdzadeh (2016), these other predictive variables can mainly be summarized as three categories of factors:

  • 1)

    “paper related factors,” such as quality and novelty of a paper (Bai, Zhang, & Lee, 2019; Wang et al., 2013); decay over time of the impact of a paper (Bai et al., 2019; Wang et al., 2013); number of references (Bornmann et al., 2014); number of pages (Bornmann et al., 2014).

  • 2)

    “journal related factors,” such as journal ranking (Kosteas, 2018).

  • 3)

    “author related factors,” such as number of authors (Bornmann et al., 2014).

Additionally, with the rise of altmetrics in recent years, some researchers have used alternative indicators (e.g., online views, downloads, and tweets) as predictive variables to predict a paper’s long-term citation impact (Shema, Bar-Ilan, & Thelwall, 2014; Thelwall & Nevill, 2018; Thelwall & Sud, 2016; Wang, Wang, & Chen, 2019).

Compared with these previous studies, our research differs in the following respects. These studies mainly focused on the prediction of the future impact of a paper, whereas our study mainly focuses on the evaluation of the existing citations of a paper. Specifically, the idea in previous studies was to establish a model based on some predictive variables to predict the long-term impact of a paper. However, it is usually difficult for decision makers and strategic planners from universities and other institutions to obtain each variable value for each paper and then establish a prediction model to predict each paper’s impact in an actual research evaluation process. This modeling mode is too complex for decision makers and strategic planners. Whereas the idea of our research is to introduce a weighting factor into the normalized citation value (e.g., CNCI in this study) of one paper to indicate the degree of reliability of the impact of this paper. This weighting factor is related to the length of the citation time window, and thus provides a reliable evaluation score of the impact of the paper.

Interestingly, the weighted CNCI indicator (i.e., WCNCI, see Eq. (3)) we propose in this study retains the advantages of the CPP/FCSm indicator (i.e., the crown indicator) (De Bruin, Kint, Luwel, & Moed, 1993; Moed, De Bruin, & van Leeuwen, 1995). The CPP/FCSm indicator is defined as:CPPFCSm=Σi=1nciΣi=1nei=1nΣi=1nwiciei,wi=eiΣj=1nej/nwhere c is the raw citation count of the paper, e is the expected citation count of the paper (i.e., the average citation counts of papers with the same subject, publication year, and document type as the given paper), and n is a set of publications. In fact, the CPP/FCSm indicator can be seen as a weighted version of the MNCS indicator (the MNCS indicator is the so-called new crown indicator and is presented as Eq. (2) (Lundberg, 2007; Waltman et al., 2011b)). The weight wi in the CPP/FCSm indicator implicitly allocates more (less) weight to the older (more recent) papers, because older (more recent) papers naturally have a relatively higher (lower) e value. However, the weight wi in the CPP/FCSm indicator simultaneously allocates more (less) weight to papers from fields with a higher (lower) expected number of citations, which is unreasonable and contrary to the original intention of field normalization, namely to eliminate the field differences in citation counts and to ensure that the normalized citations are comparable across fields. The weight in the WCNCI indicator proposed in this paper directly and only gives more (less) weight to the older (more recent) papers, which retains the advantages of CPP/FCSm while avoiding the disadvantages. Therefore, compared with the CPP/FCSm indicator, the weight in our proposed indicator is more reasonable and has a more explicit meaning. Simultaneously, our weight overcomes the disadvantage of the MNCS indicator, which weighs all papers with different citation time windows equally, and thus our proposed weighted MNCS indicator (i.e., WCNCI) incorporates the advantages of both the CPP/FCSm indicator and the MNCS indicator.

Section snippets

Data collection

The top 500 universities in the Shanghai Ranking 2017 were selected as samples to investigate the effect of weighting by taking into account the reliability of the citation window.4 We downloaded the information of papers published by these universities between 2007 and 2016 from InCites. The information included the CNCI value (counted until November 2017 with citation time windows of 1–10

Comparison of the universities’ performance: CNCI and WCNCI

Fig. 3 shows scatter plots of the WCNCI scores against the CNCI scores of the universities, and Fig. 4 shows the comparison of rankings for these two indicators. The results show that there was a strong positive correlation between the WCNCI and CNCI scores, where the correlation coefficient was 0.987 (p = 0.000); there was also a strong positive correlation between the rankings under WCNCI and CNCI, where the correlation coefficient was 0.985 (p = 0.000).

Unsurprisingly, both the scores and

Discussion

There have been several arguments about whether the CPP/FCSm indicator or MNCS indicator is more appropriate. Generally, most scholars seem to prefer the MNCS indicator because it has a clear physical meaning and can be easily explained. However, one obvious limitation of the MNCS indicator is that it weighs all papers with different citation time windows equally. Older (more recent) papers should be allocated more (less) weight to represent the reliability of the papers’ citation counts

Conclusion

A mean-based field normalized citation impact indicator, such as CNCI, is one of the most commonly used indicators to measure the citation impact of papers, researchers, institutions, and countries. However, an obvious limitation of such an indicator is that it may not be sufficiently reliable when a short citation time window is used. Generally, the older a paper is, the more reliable its citations are. If papers were published more recently, then there has not been sufficient time for them to

Author contributions

Xing Wang: Conceived and designed the analysis; Collected the data; Contributed data or analysis tools; Performed the analysis; Wrote the paper.

Zhihui Zhang: Conceived and designed the analysis; Wrote the paper.

Acknowledgment

This research was supported by MOE (Ministry of Education in China) Project of Humanities and Social Science (Project No. 17YJCZH179).

References (49)

  • J.M. Levitt et al.

    A combined bibliometric indicator to predict article impact

    Information Processing & Management

    (2011)
  • J. Lundberg

    Lifting the crown—Citation z-score

    Journal of Informetrics

    (2007)
  • H.F. Moed

    Measuring contextual citation impact of scientific journals

    Journal of Informetrics

    (2010)
  • P.D.B. Parolo et al.

    Attention decay in science

    Journal of Informetrics

    (2015)
  • C. Stegehuis et al.

    Predicting the long-term citation impact of recent publications

    Journal of Informetrics

    (2015)
  • M. Thelwall et al.

    Could scientists use Altmetric.com scores to predict longer term citation counts?

    Journal of Informetrics

    (2018)
  • G. Vaccario et al.

    Quantifying and suppressing ranking bias in a large citation network

    Journal of Informetrics

    (2017)
  • L. Waltman et al.

    A systematic empirical comparison of different approaches for normalizing citation impact indicators

    Journal of Informetrics

    (2013)
  • L. Waltman et al.

    Some modifications to the SNIP journal impact indicator

    Journal of Informetrics

    (2013)
  • L. Waltman et al.

    Towards a new crown indicator: Some theoretical considerations

    Journal of Informetrics

    (2011)
  • J. Adams

    Early citation counts correlate with accumulated impact

    Scientometrics

    (2005)
  • P. Ahlgren et al.

    The role of the Chinese Key Labs in the international and national scientific arena revisited

    Research Evaluation

    (2017)
  • P. Albarrán et al.

    The skewness of science in 219 sub-fields and a number of aggregates

    Scientometrics

    (2011)
  • A. Arimoto

    Declining symptom of academic productivity in the Japanese research university sector

    Higher Education

    (2015)
  • Cited by (0)

    View full text