Compressing multiple scales of impact detection by Reference Publication Year Spectroscopy

https://doi.org/10.1016/j.joi.2015.03.003Get rights and content

Highlights

  • RPYS provides a citation-based technique for exploring historically important papers for a given research field.

  • Up until this point, there has been no standard procedure for comparing results across multiple RPYS analyses.

  • We present a method allowing researchers to analyze and visualize the results of multiple RPYS procedures simultaneously.

Abstract

Reference Publication Year Spectroscopy (RPYS) is a scientometric technique that effectively reveals punctuated peaks of historical scientific impact on a specified research field or technology. In many cases, a seminal discovery serves as the driving force underlying any given peak. Importantly, the results from a RPYS analyses are represented on their own distinct scales, the bounds of which vary considerably across analyses. This makes comparing years of punctuated impact across multiple RPYS analyses problematic. In this paper, we propose a data transformation and visualization technique that resolves this challenge. Specifically, using a rank-order normalization procedure, we compress the results of multiple RPYS analyses into a single, consistent rank scale that clearly highlights years of punctuated impact across RPYS analyses. We suggest that rank transformation increases the effectiveness of this scientometric technique to reveal the scope of historical impact of seminal works by allowing researchers to simultaneously consider results from multiple RPYS analyses.

Introduction

Citations signify the relevance of prior research or invention. In the scientific community, the aggregation of citations attributed to a specific work is commonly taken as a central indicator of its scholarly impact (De Solla Price, 1965, Garfield et al., 1978, Radicchi et al., 2008). More generally, citations and citation counts are thought to represent how knowledge accumulates, combines and transfers to generate new ideas and discoveries. Since citations function as linkages between scientific works, citation records provide an opportunity to quantitatively identify seminal contributions to a given research field or technology (Kostoff and Shlesinger, 2005, Marx et al., 2014, van Raan, 2000).

One technique leveraging citations to detect important scientific contributions is Reference Publication Year Spectroscopy (RPYS). RPYS offers a quantitative approach to assist in identifying the historical roots of research fields and topics (Marx et al., 2014). To accomplish this, RPYS considers the references cited by a cohort of publications resulting from a particular database query. By way of example, consider a search query for topic X that returns only one article. If this article, published in 1990, cites a reference published in 1980, then 1980 is used as a data point in the foregoing analysis. As such, after a set of publications is returned from a database query, the publication date of each cited reference from this set of publications is extracted and mapped onto a frequency distribution sorted by time. The resulting visualization often reveals punctuated peaks in the distribution. These peaks correspond to years containing a larger number of cited references within discrete bins of time. Often, these peaks are driven by a large number of references to a seminal work in the field. To date, RPYS has been successfully applied to investigations of important early contributions in several research topics (Leydesdorff et al., 2014, Marx et al., 2014, Marx and Bornmann, 2013, Wray and Bornmann, 2014).

There is, however, a major challenge with the current methodology. Namely, the results produced by a given RPYS analysis are represented within their own distinct range or scale, the bounds of which vary considerably across analyses. In other words, using the presently defined RPYS technique, it could be difficult to compare patterns of maxima for the cited references of a small research fields with those of the cited references from a much larger research fields. Making RPYS analyses amenable to large-scale comparative analysis is an important extension of the technique for future applications. For instance, it would allow analysts to more readily evaluate whether a large number of research topics show a similar history of important findings as demonstrated through cited works. A second possibility is that being able to estimate the similarities between the citation histories for various subfields might open-up an entirely novel venue for defining the relationships between these subfields – with the assumption being that research areas correspondingly informed by the same seminal works are more similar than those that are not.

To address this shortcoming, we demonstrate here how the addition of a simple data transformation procedure to the standard RPYS methodology can aid in the detection of shared patterns of maxima for the cited references across RPYS analyses, which potentially suggest common historical influences. Specifically, we adopt the use of a rank-transformation procedure commonly used in inferential statistics to perform non-parametric analyses (Conover and Iman, 1981, Labovitz, 1970). This transformation compresses the multiple scales produced from various RPYS analyses into a single rank scale that allows researchers to identify years of punctuated impact across RPYS analyses. We describe a visualization procedure that efficiently represents data from multiple RPYS analyses concurrently.

To demonstrate the efficacy of this procedure, we begin with a publication that we suspect a priori has meaningfully impacted numerous research topics: the Viterbi algorithm first published by Andrew Viterbi in 1967. In this groundbreaking work, Viterbi describes an algorithm that identifies the most likely sequence of hidden states associated with a sequence of known or observed states. The algorithm is widely used in stochastic models and error-correcting (or decoding convolutional codes) as well as in a variety of computational procedures pertaining to machine speech recognition (Viterbi, 2006). Given this, we performed six RPYS analyses for research topics pulled from the Web of Science that all pertained to the development or use of the Viterbi algorithm (Viterbi, 1967). The Viterbi algorithm's impact and use in a wide array of research communities, from statisticians studying stochastic models to engineers and computer scientists working on various aspects of machine speech recognition, make it an ideal candidate for demonstrating the value of this data-transformation procedure for comparing multiple RPYS analyses concurrently.

Section snippets

Method

We accessed and downloaded data from the Thomson Reuters Web of Science (WoS) between December 12, 2014 and December 14, 2014. We performed topic searches using the Web of Science Core Collection, for which we had back-records from 1974 to 2014. The topic searches performed for our six RPYS analyses were as follows: (1) “Viterbi”, (2) “convolutional code” OR “convolutional codes”, (3) “hiddenmarkov model” OR “hidden markov models”, (4) “continuous speech recognition”, (5) “automatic speech

Results

Results from RPYS analyses are most typically represented in two ways: (1) the raw number of cited references occurring per year or (2) the absolute deviation of the number of cited references on a particular year from the 5-year median. The second method of visualization is often favored due to the fact that the raw number of cited references increases toward the present date. By taking the absolute median cited references from a 5-year median, one makes it easier to identify works that might

Discussion

In this short report, we consider the challenge of comparing results from multiple RPYS analyses. Due to the fact that the size of different research topics or areas can vary substantially, this makes the task of identifying similarities across individual RPYS analyses problematic. We show that a simple rank-transform of the resulting vector of the absolute deviation of cited references within RPYS analyses makes them amenable to comparative evaluation. Specifically, the transformation provides

Conflict of interest

JAC works for the non-profit Virginia Tech Applied Research Corporation (VT-ARC), which supports the Air Force Office of Scientific Research (AFOSR). TWH consults for VT-ARC and is the former Chief Scientist of AFOSR. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory.

Acknowledgements

Effort sponsored in whole or in part by the Air Force Research Laboratory, USAF, under Partnership Intermediary No. FA9550-13-3-0001. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.

The authors thank Stephanie Carmack and two anonymous reviewers for constructive feedback.

References (14)

  • W. Conover et al.

    Rank transformations as a bridge between parametric and nonparametric statistics

    The American Statistician

    (1981)
  • D.J. De Solla Price

    Networks of scientific papers

    Science

    (1965)
  • E. Garfield et al.

    Citation data as science indicators

    Toward a metric of science: The advent of science indicators

    (1978)
  • R. Kostoff et al.

    CAB: Citation-assisted background

    Scientometrics

    (2005)
  • S. Labovitz

    The assignment of numbers to rank order categories

    American Sociological Review

    (1970)
  • L. Leydesdorff et al.

    Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST

    Journal of Informetrics

    (2014)
  • W. Marx et al.

    Tracing the origin of a scientific legend by reference publication year spectroscopy (RPYS): The legend of the Darwin finches

    Scientometrics

    (2013)
There are more references available in the full text version of this article.

Cited by (30)

  • Mitigating instrument effects in 60 MHz <sup>1</sup>H NMR spectroscopy for authenticity screening of edible oils

    2022, Food Chemistry
    Citation Excerpt :

    In contrast, because all the data points in a magnitude spectrum contain at least some signal, the ranking process remains meaningful throughout. Rank transformation may offer an advantage from the perspective of visualisation, as has been recognised in other disciplines (Comins & Hussey, 2015). By swapping out intensity information for an integer rank, the maximum possible internal intensity ratio of any pair of features is fixed by the number of data points in the spectrum.

  • Identifying citation patterns of scientific breakthroughs: A perspective of dynamic citation process

    2021, Information Processing and Management
    Citation Excerpt :

    A large stream of literature focuses on the temporal citation profile of breakthrough works (e.g., Ponomarev, Williams & Hackett et al., 2014; Comins & Leydesdorff, 2017a, 2017b) or eminent researchers who are likely to create breakthroughs (Garfield, 1990). The aims are either to develop forecasting models for early detection of breakthrough papers (Ponomarev, Williams & Hackett et al., 2014; Ponomarev, Lawton & Williams, 2014) or to describe citation patterns of seminal papers in a research field, whether through visualization techniques (Comins & Hussey, 2015; Comins & Leydesdorff, 2017a; Comins & Leydesdorff, 2017b) or via mathematical functions (Liu & Rousseau, 2014). Scientific works with significant but delayed recognition (Cole, 1970; Ke et al., 2015; Min et al., 2016; Sun et al., 2016;) have recently emerged as a topic of particular interest.

  • The four dimensions of social network analysis: An overview of research methods, applications, and software tools

    2020, Information Fusion
    Citation Excerpt :

    The peaks in the spectrogram (deviations from the median) indicate those specific years with highly cited publications within the domain of the sample. Also, Multi RPYS is an extension of the standard method [23]. It segments the original citing articles based on their publication years and conducts a Standard RPYS analyzes for each one, visualizing the results as a heat map.

  • Patent citation spectroscopy (PCS): Online retrieval of landmark patents based on an algorithmic approach

    2018, Journal of Informetrics
    Citation Excerpt :

    The main methodological advantage introduced by this paper is to offer a means to distinguish these cases to answer the next logical question in the RPYS procedure: when and what is the most seminal document? Thus far, RPYS has successfully been used to identify seminal research publications across a multitude of scientific domains (Comins & Hussey, 2015a, 2015b; Comins & Leydesdorff, 2016; Elango, Bornmann, & Kannan, 2016; Leydesdorff, Bornmann, Marx, & Milojević, 2014; Leydesdorff, Bornmann, Comins, & Milojević, 2016; Marx & Bornmann, 2013; Marx et al., 2014; Thor et al., 2016; Wray & Bornmann, 2015). In addition, recent work shows the convergence between RPYS and subject matter expert’s identification of seminal scientific papers within given areas of basic biomedical research (Comins & Leydesdorff, 2017).

View all citing articles on Scopus
View full text