Evaluating paper and author ranking algorithms using impact and contribution awards

doi:10.1016/j.joi.2016.01.010

Journal of Informetrics

Volume 10, Issue 2, May 2016, Pages 392-407

https://doi.org/10.1016/j.joi.2016.01.010 Get rights and content

Highlights

•
Paper and author ranking algorithms are compared and evaluated.
•
Large test data sets that are based on expert opinions are used.
•
Using citation counts is, in general, the best ranking metric to measure high-impact papers.
•
Author-level Eigenfactor performs best in ranking high-impact authors.
•
Algorithms based on PageRank rank scientifically important papers better.

Abstract

In the work presented in this paper, we analyse ranking algorithms that can be applied to bibliographic citation networks and rank academic entities such as papers and authors. We evaluate how well these algorithms identify important and high-impact entities.

The ranking algorithms are computed on the Microsoft Academic Search (MAS) and the ACM digital library citation databases. The MAS database contains 40 million papers and over 260 million citations that span across multiple academic disciplines, while the ACM database contains 1.8 million papers from the computing literature and over 7 million citations.

We evaluate the ranking algorithms by using a test data set of papers and authors that won renowned prizes at numerous computer science conferences. The results show that using citation counts is, in general, the best ranking metric to measure high-impact. However, for certain tasks, such as ranking important papers or identifying high-impact authors, algorithms based on PageRank perform better.

Introduction

Citation analysis is an important tool in the academic community. It can aid universities, funding bodies, and individual researchers to evaluate scientific work and direct resources appropriately. With the rapid growth of the scientific enterprise and the increase of online libraries that include citation analysis tools, the need for a systematic evaluation of these tools becomes more important.

In bibliometrics, citation counts or metrics that are based directly on citation counts are still the de facto measurements used to evaluate an entity's quality, impact, influence and importance. However, algorithms that only use citation counts or are based only on the structure of citation networks can only measure quality and importance to a small degree. What they are in fact measuring is their impact or popularity which are not necessarily related to their intrinsic quality and the importance of their contribution to the scientific enterprise. The difficulty is to obtain objective test data that can be used with appropriate evaluation metrics to evaluate ranking algorithms in terms of how well they measure a scientific entity's impact, quality or importance.

In Section 2 background information about the used ranking algorithms is given and related work, in which appropriate test data sets are used, is outlined. It shows that in previous research only small test data sets have been used to validate proposed ranking methods that only apply to one or two fields within computer science.

In this paper we use four different test data sets that are based on expert opinions each of which is substantially larger than those in previous research and apply them in different scenarios:

•
207 papers that won high-impact awards (usually 10–15 years after publication) from 14 difference computer science conferences are used to evaluate the algorithms on how well they identify high-impact papers.
•
464 papers from 32 venues that won best-paper awards at the time of publication are used to see how well venues predict future high-impact papers.
•
From a list of 19 different awards, 268 authors that won one or more prizes for their innovative, significant and enduring contributions to science were collected. This data set is used to evaluate author-ranking algorithms.
•
A list of 129 important papers, sourced from Wikipedia, is used to evaluate how well the algorithms identify important scientific work.

Therefore, this paper focuses on algorithms that are designed to measure a paper's or an author's impact and are described in Section 3. In Section 4 the MAS (Microsoft, 2013) and ACM (Association for Computing Machinery, 2014) citation data sets are described which are used for the experiments in this article. Section 5 shows the results of evaluating the various ranking algorithms with the above mentioned test data sets followed by a discussion of the results in Section 6.

Section snippets

Background information

The idea of using algorithms based on the PageRank algorithm has been applied to academic citation networks frequently. For example, Chen, Xie, Maslov, and Redner (2007) apply the algorithm to all American Physical Society publications between 1893 and 2003. They show that there exists a close correlation between a paper's number of citations and its PageRank score but that important papers, based purely on the authors’ opinions, are found by the PageRank algorithm that would not have easily

Ranking algorithms

In this paper CountRank (CR) refers to the method of simply ranking papers according to their citation counts. Let G = (V, E) be a directed citation graph containing n papers in the vertex set V and m citations in the edge set E. A CountRank score CR(i) for each paper i ∈ V can then be calculated using the equation $CR (i) = \frac{id (i)}{m}$ where id(i) is the in-degree of vertex i which corresponds to the number of citation that the paper associated with vertex i has received. The citation counts of papers are

The data sets

Microsoft Academic Search (MAS) (Microsoft Research, 2013) is an academic search engine developed by Microsoft Research. The source data set is an integration of various publishing sources such as Springer and ACM.

The entities that are extracted from the data set and processed for the experiments and analyses in the following sections are papers, authors, publication venues and references. The raw count of these entities are as follows; 39,846,004 papers, 19,825,806 authors and 262,555,262

Evaluation

For the experiments in this paper four different types of test data sets are used that are based on expert opinions and collected by hand from Internet sources. Firstly, papers that won high-impact awards at conferences are used to train and evaluate the paper ranking algorithms on how well they identify and rank high-impact papers. The results are shown in Section 5.1. Secondly, a list of papers that won best paper awards at conferences was compiled and used to evaluate how well these

Discussion

The results shown in the following discussion are the ones obtained from the experiments using the MAS data set. However, the conclusions drawn from this discussion hold true for the results using the ACM data set as well.

The damping factor of PageRank has multiple uses and implications. The same properties hold true for algorithms that are based on PageRank such as NewRank, YetRank and the Author-Level Eigenfactor metric.

Firstly, when α → 1 more focus is placed on the characteristics of the

Threats to validity

For all the experiments in Section 5, the CS subset of the MAS data set was used. Therefore, only citations are used that originate from CS papers or are citations that directly cite CS papers. This means that all citations that originate from outside the CS domain are weighted the same, which does not reflect the true weight if the entire citation network would have been considered. Therefore, using the CS citation network has to be seen as an approximation of the entire academic citation

Conclusion

Simply counting citations is the best metric for ranking high-impact papers in general. This suggests that citation counts, although surrounded by controversy on their fairness and interpretation (Garfield, 1955), are a good measurement of a paper's impact.

However, when the goal is to find important papers and influential authors, metrics based on PageRank outperform the use of citation counts. This was shown by evaluating the author ranking algorithms using a set of authors that won

Author contributions

Conceived and designed the analysis: MD.

Collected the data: MD.

Contributed data or analysis tools: MD.

Performed the analysis: MD.

Wrote the paper: MD.

Supervisor: WV, JG.

Proof-read: WV.

References (30)

S. Brin et al.
The anatomy of a large-scale hypertextual web search engine
P. Chen et al.
Finding scientific gems with Google's PageRank algorithm?
Journal of Informetrics
(2007)
M. Nykl et al.
Author ranking based on personalized PageRank?
Journal of Informetrics
(2015)
M. Nykl et al.
PageRank variants in the evaluation of citation networks?
Journal of Informetrics
(2014)
A. Sidiropoulos et al.
Generalized comparison of graph-based ranking algorithms for publications and authors?
Journal of Systems and Software
(2006)
E. Yan et al.
Discovering author impact: A PageRank perspective?
Information Processing & Management
(2011)
ACM Special Interest Group on Management of Data
SIGMOD awards
(2014)
Association for Computing Machinery
A. M. Turing Award
(2012)
Association for Computing Machinery
ACM Digital Library
(2014)
C.T. Bergstrom et al.
The eigenfactor metrics?
The Journal of Neuroscience
(2008)

J. Connor

Google Scholar Citations Open To All

(2011)

M. Dunaiski

Analysing ranking algorithms and publication trends on scholarly citation networks

(2014)

M. Dunaiski et al.

Comparing paper ranking algorithms

L. Egghe

Theory and practise of the g-index?

Scientometrics

(2006)

D. Fiala

Time-aware PageRank for bibliographic networks

Journal of Informetrics

(2012)

Cited by (52)

RelRank: A relevance-based author ranking algorithm for individual publication venues
2023, Information Processing and Management
Hiring appropriate editors, chairs and committee members for academic journals and conferences is challenging. It requires a targeted search for high profile scholars who are active in the field as well as in the publication venue. Many author-level metrics have been employed for this task, such as the h-index, PageRank and their variants. However, these metrics are global measures which evaluate authors’ productivity and impact without differentiating the publication venues. From the perspective of a venue, it is also important to have a localised metric which can specifically indicate the significance of academic authors for the particular venue. In this paper, we propose a relevance-based author ranking algorithm to measure the significance of authors to individual venues. Specifically, we develop a co-authorship network considering the author-venue relationship which integrates the statistical relevance of authors to individual venues. The RelRank, an improved PageRank algorithm embedding author relevance, is then proposed to rank authors for each venue. Extensive experiments are carried out to analyse the proposed RelRank in comparison with classic author-level metrics on three datasets of different research domains. We also evaluate the effectiveness of the RelRank and comparison metrics in recommending editorial boards of three venues using test data. Results demonstrate that the RelRank is able to identify not only the high profile scholars but also those who are particularly significant for individual venues.
Educational Big Data: Predictions, Applications and Challenges
2021, Big Data Research
Citation Excerpt :
In particular, using an educational big data platform can supervise teachers' scientific research funds and effectively evaluate their scientific research value. There are a lot of research about scientific articles [11–13,29]. ( 2) Evaluation of teachers' teaching.
Educational big data is becoming a strategic educational asset, exceptionally significant in advancing educational reform. The term educational big data stems from the rapidly growing educational data development, including students' inherent attributes, learning behavior, and psychological state. Educational big data has many applications that can be used for educational administration, teaching innovation, and research management. The representative examples of such applications are student academic performance prediction, employment recommendation, and financial support for low-income students. Different empirical studies have shown that it is possible to predict student performance in the courses during the next term. Predictive research for the higher education stage has become an attractive area of study since it allowed us to predict student behavior. In this survey, we will review predictive research, its applications, and its challenges. We first introduce the significance and background of educational big data. Second, we review the students' academic performance prediction research, such as factors influencing students' academic performance, predicting models, evaluating indices. Third, we introduce the applications of educational big data such as prediction, recommendation, and evaluation. Finally, we investigate challenging research issues in this area. This discussion aims to provide a comprehensive overview of educational big data.
Predicting future influence of papers, researchers, and venues in a dynamic academic network
2020, Journal of Informetrics
Performance evaluation and prediction of academic achievements is an essential task for scientists, research organizations, research funding bodies, and government agencies alike. Recently, heterogeneous networks have been used to evaluate or predict performance of multi-entities including papers, researchers, and venues with some success. However, only a minimum of effort has been made to predict the future influence of papers, researchers and venues. In this paper, we propose a new framework WMR-Rank for this purpose. Based on the dynamic and heterogeneous network of multiple entities, we extract seven types of relations among them. The framework supports useful features including the refined granularity of relevant entities such as authors and venues, time awareness for published papers and their citations, differentiating the contribution of multiple coauthors to the same paper, amongst others. By leveraging all seven types of relations and fusing the rich information in a mutually reinforcing style, we are able to predict future influence of papers, authors and venues more precisely. Using the ACL dataset, our experimental results demonstrate that the proposed approach considerably outperforms state-of-the art competitors.
Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data
2020, Journal of Informetrics
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics’ ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics’ performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
Globalised vs averaged: Bias and ranking performance on the author level
2019, Journal of Informetrics
Citation Excerpt :
However, collecting direct peer-assessed test data is time consuming and expensive. We therefore use a proxy for this assessment which comprises test data based on awards and other recognitions that researchers have received for their outstanding contributions in their fields (Dunaiski, Geldenhuys, & Visser, 2018a; Dunaiski, Visser, & Geldenhuys, 2016; Fiala, Šubelj, Žitnik, & Bajec, 2015; Fiala, 2012; Fiala, Rousselot, & Ježek, 2008; Fiala & Tutoky, 2017; Gao, Wang, Li, Zhang, & Zeng, 2016; Nykl, Campr, & Ježek, 2015; Nykl, Ježek, Fiala, & Dostal, 2014). Specifically, we use selected researchers that have won prizes for their highly influential and long-lasting contributions and researchers that have been awarded the ACM fellowship for similar achievements.
We analyse the difference between the averaged (average of ratios) and globalised (ratio of averages) author-level aggregation approaches based on various paper-level metrics. We evaluate the aggregation variants in terms of (1) their field bias on the author-level and (2) their ranking performance based on test data that comprises researchers that have received fellowship status or won prestigious awards for their long-lasting and high-impact research contributions to their fields. We consider various direct and indirect paper-level metrics with different normalisation approaches (mean-based, percentile-based, co-citation-based) and focus on the bias and performance differences between the two aggregation variants of each metric. We execute all experiments on two publication databases which use different field categorisation schemes. The first uses author-chosen concept categories and covers the computer science literature. The second covers all disciplines and categorises papers by keywords based on their contents. In terms of bias, we find relatively little difference between the averaged and globalised variants. For mean-normalised citation counts we find no significant difference between the two approaches. However, the percentile-based metric shows less bias with the globalised approach, except for citation windows smaller than four years. On the multi-disciplinary database, PageRank has the overall least bias but shows no significant difference between the two aggregation variants. The averaged variants of most metrics have less bias for small citation windows. For larger citation windows the differences are smaller and are mostly insignificant.
In terms of ranking the well-established researchers who have received accolades for their high-impact contributions, we find that the globalised variant of the percentile-based metric performs better. Again we find no significant differences between the globalised and averaged variants based on citation counts and PageRank scores.
Knowledge diffusion trajectories of PageRank: A main path analysis
2024, Journal of Information Science

View all citing articles on Scopus

View full text

Evaluating paper and author ranking algorithms using impact and contribution awards

Highlights

Abstract

Introduction

Section snippets

Background information

Ranking algorithms

The data sets

Evaluation

Discussion

Threats to validity

Conclusion

Author contributions

Journal of Informetrics

Journal of Informetrics

Journal of Informetrics

Journal of Systems and Software

Information Processing & Management

SIGMOD awards

A. M. Turing Award

ACM Digital Library

The eigenfactor metrics?

The Journal of Neuroscience

Google Scholar Citations Open To All

Analysing ranking algorithms and publication trends on scholarly citation networks

Comparing paper ranking algorithms

Theory and practise of the g-index?

Scientometrics

Time-aware PageRank for bibliographic networks

Journal of Informetrics