Evaluating paper and author ranking algorithms using impact and contribution awards
Introduction
Citation analysis is an important tool in the academic community. It can aid universities, funding bodies, and individual researchers to evaluate scientific work and direct resources appropriately. With the rapid growth of the scientific enterprise and the increase of online libraries that include citation analysis tools, the need for a systematic evaluation of these tools becomes more important.
In bibliometrics, citation counts or metrics that are based directly on citation counts are still the de facto measurements used to evaluate an entity's quality, impact, influence and importance. However, algorithms that only use citation counts or are based only on the structure of citation networks can only measure quality and importance to a small degree. What they are in fact measuring is their impact or popularity which are not necessarily related to their intrinsic quality and the importance of their contribution to the scientific enterprise. The difficulty is to obtain objective test data that can be used with appropriate evaluation metrics to evaluate ranking algorithms in terms of how well they measure a scientific entity's impact, quality or importance.
In Section 2 background information about the used ranking algorithms is given and related work, in which appropriate test data sets are used, is outlined. It shows that in previous research only small test data sets have been used to validate proposed ranking methods that only apply to one or two fields within computer science.
In this paper we use four different test data sets that are based on expert opinions each of which is substantially larger than those in previous research and apply them in different scenarios:
- •
207 papers that won high-impact awards (usually 10–15 years after publication) from 14 difference computer science conferences are used to evaluate the algorithms on how well they identify high-impact papers.
- •
464 papers from 32 venues that won best-paper awards at the time of publication are used to see how well venues predict future high-impact papers.
- •
From a list of 19 different awards, 268 authors that won one or more prizes for their innovative, significant and enduring contributions to science were collected. This data set is used to evaluate author-ranking algorithms.
- •
A list of 129 important papers, sourced from Wikipedia, is used to evaluate how well the algorithms identify important scientific work.
Therefore, this paper focuses on algorithms that are designed to measure a paper's or an author's impact and are described in Section 3. In Section 4 the MAS (Microsoft, 2013) and ACM (Association for Computing Machinery, 2014) citation data sets are described which are used for the experiments in this article. Section 5 shows the results of evaluating the various ranking algorithms with the above mentioned test data sets followed by a discussion of the results in Section 6.
Section snippets
Background information
The idea of using algorithms based on the PageRank algorithm has been applied to academic citation networks frequently. For example, Chen, Xie, Maslov, and Redner (2007) apply the algorithm to all American Physical Society publications between 1893 and 2003. They show that there exists a close correlation between a paper's number of citations and its PageRank score but that important papers, based purely on the authors’ opinions, are found by the PageRank algorithm that would not have easily
Ranking algorithms
In this paper CountRank (CR) refers to the method of simply ranking papers according to their citation counts. Let G = (V, E) be a directed citation graph containing n papers in the vertex set V and m citations in the edge set E. A CountRank score CR(i) for each paper i ∈ V can then be calculated using the equationwhere id(i) is the in-degree of vertex i which corresponds to the number of citation that the paper associated with vertex i has received. The citation counts of papers are
The data sets
Microsoft Academic Search (MAS) (Microsoft Research, 2013) is an academic search engine developed by Microsoft Research. The source data set is an integration of various publishing sources such as Springer and ACM.
The entities that are extracted from the data set and processed for the experiments and analyses in the following sections are papers, authors, publication venues and references. The raw count of these entities are as follows; 39,846,004 papers, 19,825,806 authors and 262,555,262
Evaluation
For the experiments in this paper four different types of test data sets are used that are based on expert opinions and collected by hand from Internet sources. Firstly, papers that won high-impact awards at conferences are used to train and evaluate the paper ranking algorithms on how well they identify and rank high-impact papers. The results are shown in Section 5.1. Secondly, a list of papers that won best paper awards at conferences was compiled and used to evaluate how well these
Discussion
The results shown in the following discussion are the ones obtained from the experiments using the MAS data set. However, the conclusions drawn from this discussion hold true for the results using the ACM data set as well.
The damping factor of PageRank has multiple uses and implications. The same properties hold true for algorithms that are based on PageRank such as NewRank, YetRank and the Author-Level Eigenfactor metric.
Firstly, when α → 1 more focus is placed on the characteristics of the
Threats to validity
For all the experiments in Section 5, the CS subset of the MAS data set was used. Therefore, only citations are used that originate from CS papers or are citations that directly cite CS papers. This means that all citations that originate from outside the CS domain are weighted the same, which does not reflect the true weight if the entire citation network would have been considered. Therefore, using the CS citation network has to be seen as an approximation of the entire academic citation
Conclusion
Simply counting citations is the best metric for ranking high-impact papers in general. This suggests that citation counts, although surrounded by controversy on their fairness and interpretation (Garfield, 1955), are a good measurement of a paper's impact.
However, when the goal is to find important papers and influential authors, metrics based on PageRank outperform the use of citation counts. This was shown by evaluating the author ranking algorithms using a set of authors that won
Author contributions
Conceived and designed the analysis: MD.
Collected the data: MD.
Contributed data or analysis tools: MD.
Performed the analysis: MD.
Wrote the paper: MD.
Supervisor: WV, JG.
Proof-read: WV.
References (30)
- et al.
The anatomy of a large-scale hypertextual web search engine
- et al.
Finding scientific gems with Google's PageRank algorithm?
Journal of Informetrics
(2007) - et al.
Author ranking based on personalized PageRank?
Journal of Informetrics
(2015) - et al.
PageRank variants in the evaluation of citation networks?
Journal of Informetrics
(2014) - et al.
Generalized comparison of graph-based ranking algorithms for publications and authors?
Journal of Systems and Software
(2006) - et al.
Discovering author impact: A PageRank perspective?
Information Processing & Management
(2011) SIGMOD awards
(2014)A. M. Turing Award
(2012)ACM Digital Library
(2014)- et al.
The eigenfactor metrics?
The Journal of Neuroscience
(2008)
Google Scholar Citations Open To All
Analysing ranking algorithms and publication trends on scholarly citation networks
Comparing paper ranking algorithms
Theory and practise of the g-index?
Scientometrics
Time-aware PageRank for bibliographic networks
Journal of Informetrics
Cited by (52)
RelRank: A relevance-based author ranking algorithm for individual publication venues
2023, Information Processing and ManagementEducational Big Data: Predictions, Applications and Challenges
2021, Big Data ResearchCitation Excerpt :In particular, using an educational big data platform can supervise teachers' scientific research funds and effectively evaluate their scientific research value. There are a lot of research about scientific articles [11–13,29]. ( 2) Evaluation of teachers' teaching.
Predicting future influence of papers, researchers, and venues in a dynamic academic network
2020, Journal of InformetricsUnbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data
2020, Journal of InformetricsGlobalised vs averaged: Bias and ranking performance on the author level
2019, Journal of InformetricsCitation Excerpt :However, collecting direct peer-assessed test data is time consuming and expensive. We therefore use a proxy for this assessment which comprises test data based on awards and other recognitions that researchers have received for their outstanding contributions in their fields (Dunaiski, Geldenhuys, & Visser, 2018a; Dunaiski, Visser, & Geldenhuys, 2016; Fiala, Šubelj, Žitnik, & Bajec, 2015; Fiala, 2012; Fiala, Rousselot, & Ježek, 2008; Fiala & Tutoky, 2017; Gao, Wang, Li, Zhang, & Zeng, 2016; Nykl, Campr, & Ježek, 2015; Nykl, Ježek, Fiala, & Dostal, 2014). Specifically, we use selected researchers that have won prizes for their highly influential and long-lasting contributions and researchers that have been awarded the ACM fellowship for similar achievements.
Knowledge diffusion trajectories of PageRank: A main path analysis
2024, Journal of Information Science