Evaluating paper and author ranking algorithms using impact and contribution awards

https://doi.org/10.1016/j.joi.2016.01.010Get rights and content

Highlights

  • Paper and author ranking algorithms are compared and evaluated.

  • Large test data sets that are based on expert opinions are used.

  • Using citation counts is, in general, the best ranking metric to measure high-impact papers.

  • Author-level Eigenfactor performs best in ranking high-impact authors.

  • Algorithms based on PageRank rank scientifically important papers better.

Abstract

In the work presented in this paper, we analyse ranking algorithms that can be applied to bibliographic citation networks and rank academic entities such as papers and authors. We evaluate how well these algorithms identify important and high-impact entities.

The ranking algorithms are computed on the Microsoft Academic Search (MAS) and the ACM digital library citation databases. The MAS database contains 40 million papers and over 260 million citations that span across multiple academic disciplines, while the ACM database contains 1.8 million papers from the computing literature and over 7 million citations.

We evaluate the ranking algorithms by using a test data set of papers and authors that won renowned prizes at numerous computer science conferences. The results show that using citation counts is, in general, the best ranking metric to measure high-impact. However, for certain tasks, such as ranking important papers or identifying high-impact authors, algorithms based on PageRank perform better.

Introduction

Citation analysis is an important tool in the academic community. It can aid universities, funding bodies, and individual researchers to evaluate scientific work and direct resources appropriately. With the rapid growth of the scientific enterprise and the increase of online libraries that include citation analysis tools, the need for a systematic evaluation of these tools becomes more important.

In bibliometrics, citation counts or metrics that are based directly on citation counts are still the de facto measurements used to evaluate an entity's quality, impact, influence and importance. However, algorithms that only use citation counts or are based only on the structure of citation networks can only measure quality and importance to a small degree. What they are in fact measuring is their impact or popularity which are not necessarily related to their intrinsic quality and the importance of their contribution to the scientific enterprise. The difficulty is to obtain objective test data that can be used with appropriate evaluation metrics to evaluate ranking algorithms in terms of how well they measure a scientific entity's impact, quality or importance.

In Section 2 background information about the used ranking algorithms is given and related work, in which appropriate test data sets are used, is outlined. It shows that in previous research only small test data sets have been used to validate proposed ranking methods that only apply to one or two fields within computer science.

In this paper we use four different test data sets that are based on expert opinions each of which is substantially larger than those in previous research and apply them in different scenarios:

  • 207 papers that won high-impact awards (usually 10–15 years after publication) from 14 difference computer science conferences are used to evaluate the algorithms on how well they identify high-impact papers.

  • 464 papers from 32 venues that won best-paper awards at the time of publication are used to see how well venues predict future high-impact papers.

  • From a list of 19 different awards, 268 authors that won one or more prizes for their innovative, significant and enduring contributions to science were collected. This data set is used to evaluate author-ranking algorithms.

  • A list of 129 important papers, sourced from Wikipedia, is used to evaluate how well the algorithms identify important scientific work.

Therefore, this paper focuses on algorithms that are designed to measure a paper's or an author's impact and are described in Section 3. In Section 4 the MAS (Microsoft, 2013) and ACM (Association for Computing Machinery, 2014) citation data sets are described which are used for the experiments in this article. Section 5 shows the results of evaluating the various ranking algorithms with the above mentioned test data sets followed by a discussion of the results in Section 6.

Section snippets

Background information

The idea of using algorithms based on the PageRank algorithm has been applied to academic citation networks frequently. For example, Chen, Xie, Maslov, and Redner (2007) apply the algorithm to all American Physical Society publications between 1893 and 2003. They show that there exists a close correlation between a paper's number of citations and its PageRank score but that important papers, based purely on the authors’ opinions, are found by the PageRank algorithm that would not have easily

Ranking algorithms

In this paper CountRank (CR) refers to the method of simply ranking papers according to their citation counts. Let G = (V, E) be a directed citation graph containing n papers in the vertex set V and m citations in the edge set E. A CountRank score CR(i) for each paper i  V can then be calculated using the equationCR(i)=id(i)mwhere id(i) is the in-degree of vertex i which corresponds to the number of citation that the paper associated with vertex i has received. The citation counts of papers are

The data sets

Microsoft Academic Search (MAS) (Microsoft Research, 2013) is an academic search engine developed by Microsoft Research. The source data set is an integration of various publishing sources such as Springer and ACM.

The entities that are extracted from the data set and processed for the experiments and analyses in the following sections are papers, authors, publication venues and references. The raw count of these entities are as follows; 39,846,004 papers, 19,825,806 authors and 262,555,262

Evaluation

For the experiments in this paper four different types of test data sets are used that are based on expert opinions and collected by hand from Internet sources. Firstly, papers that won high-impact awards at conferences are used to train and evaluate the paper ranking algorithms on how well they identify and rank high-impact papers. The results are shown in Section 5.1. Secondly, a list of papers that won best paper awards at conferences was compiled and used to evaluate how well these

Discussion

The results shown in the following discussion are the ones obtained from the experiments using the MAS data set. However, the conclusions drawn from this discussion hold true for the results using the ACM data set as well.

The damping factor of PageRank has multiple uses and implications. The same properties hold true for algorithms that are based on PageRank such as NewRank, YetRank and the Author-Level Eigenfactor metric.

Firstly, when α  1 more focus is placed on the characteristics of the

Threats to validity

For all the experiments in Section 5, the CS subset of the MAS data set was used. Therefore, only citations are used that originate from CS papers or are citations that directly cite CS papers. This means that all citations that originate from outside the CS domain are weighted the same, which does not reflect the true weight if the entire citation network would have been considered. Therefore, using the CS citation network has to be seen as an approximation of the entire academic citation

Conclusion

Simply counting citations is the best metric for ranking high-impact papers in general. This suggests that citation counts, although surrounded by controversy on their fairness and interpretation (Garfield, 1955), are a good measurement of a paper's impact.

However, when the goal is to find important papers and influential authors, metrics based on PageRank outperform the use of citation counts. This was shown by evaluating the author ranking algorithms using a set of authors that won

Author contributions

Conceived and designed the analysis: MD.

Collected the data: MD.

Contributed data or analysis tools: MD.

Performed the analysis: MD.

Wrote the paper: MD.

Supervisor: WV, JG.

Proof-read: WV.

References (30)

  • J. Connor

    Google Scholar Citations Open To All

    (2011)
  • M. Dunaiski

    Analysing ranking algorithms and publication trends on scholarly citation networks

    (2014)
  • M. Dunaiski et al.

    Comparing paper ranking algorithms

  • L. Egghe

    Theory and practise of the g-index?

    Scientometrics

    (2006)
  • D. Fiala

    Time-aware PageRank for bibliographic networks

    Journal of Informetrics

    (2012)
  • Cited by (52)

    • Educational Big Data: Predictions, Applications and Challenges

      2021, Big Data Research
      Citation Excerpt :

      In particular, using an educational big data platform can supervise teachers' scientific research funds and effectively evaluate their scientific research value. There are a lot of research about scientific articles [11–13,29]. ( 2) Evaluation of teachers' teaching.

    • Globalised vs averaged: Bias and ranking performance on the author level

      2019, Journal of Informetrics
      Citation Excerpt :

      However, collecting direct peer-assessed test data is time consuming and expensive. We therefore use a proxy for this assessment which comprises test data based on awards and other recognitions that researchers have received for their outstanding contributions in their fields (Dunaiski, Geldenhuys, & Visser, 2018a; Dunaiski, Visser, & Geldenhuys, 2016; Fiala, Šubelj, Žitnik, & Bajec, 2015; Fiala, 2012; Fiala, Rousselot, & Ježek, 2008; Fiala & Tutoky, 2017; Gao, Wang, Li, Zhang, & Zeng, 2016; Nykl, Campr, & Ježek, 2015; Nykl, Ježek, Fiala, & Dostal, 2014). Specifically, we use selected researchers that have won prizes for their highly influential and long-lasting contributions and researchers that have been awarded the ACM fellowship for similar achievements.

    View all citing articles on Scopus
    View full text