An algorithm for automatic assignment of reviewers to papers

Kalmukov, Yordan

doi:10.1007/s11192-020-03519-0

An algorithm for automatic assignment of reviewers to papers

Published: 17 June 2020

Volume 124, pages 1811–1850, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

Yordan Kalmukov ORCID: orcid.org/0000-0002-6402-9510¹

740 Accesses
12 Citations
4 Altmetric
Explore all metrics

Abstract

The assignment of reviewers to papers is one of the most important and challenging tasks in organizing scientific conferences and a peer review process in general. It is a typical example of an optimization task where limited resources (reviewers) should be assigned to a number of consumers (papers), so that every paper should be evaluated by highly competent, in its subject domain, reviewers while maintaining a workload balancing of the reviewers. This article suggests a heuristic algorithm for automatic assignment of reviewers to papers that achieves accuracy of about 98–99% in comparison to the maximum-weighted matching (the most accurate) algorithms, but has better time complexity of Θ(n²). The algorithm provides an uniform distribution of papers to reviewers (i.e. all reviewers evaluate roughly the same number of papers); guarantees that if there is at least one reviewer competent to evaluate a paper, then the paper will have a reviewer assigned to it; and allows iterative and interactive execution that could further increase accuracy and enables subsequent reassignments. Both accuracy and time complexity are experimentally confirmed by performing a large number of experiments and proper statistical analyses. Although it is initially designed to assign reviewers to papers, the algorithm is universal and could be successfully implemented in other subject domains, where assignment or matching is necessary. For example: assigning resources to consumers, tasks to persons, matching men and women on dating web sites, grouping documents in digital libraries and others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantifying the quality of peer reviewers through Zipf’s law

Article 05 September 2015

Marcel Ausloos, Olgica Nedic, … Piotr Fronczak

A Survey of Methods for Improving Review Quality

STRATEGY: a tool for the formulation of peer-review strategies

Article 22 July 2017

J. A. García, Rosa Rodriguez-Sánchez & J. Fdez-Valdivia

References

ADBIS. (2007). International Conference on Advances in Databases and Information Systems. http://www.adbis.org/
Blei, D. M., Ng, A. Y., Jordan, M. I., & Lafferty, J. (2003). Latent dirichlet allocation. Journal of Machine Learning Research,3, 2003.
Google Scholar
Cechlárová, K., Fleiner, T., & Potpinková, E. (2014). Assigning evaluators to research grant applications: The case of Slovak Research and Development Agency. Scientometrics,99(2), 495–506.
Article Google Scholar
Charlin, L., & Richard Z. (2013). The Toronto paper matching system: An automated paper-reviewer assignment system. In Proceedings of the 30th international conference on machine learning, Atlanta, Georgia, USA, 2013 (Vol. 28). JMLR: W&CP.
Charlin, L., Zemel, R., & Boutilier, C. (2011) A framework for optimizing paper matching. In Proceedings of the 27th annual conference on uncertainty in artificial intelligence (Corvallis, OR, 2011) (pp. 86–95). AUAI Press
Cochran, W. G. (1941). The distribution of the largest of a set of estimated variances as a fraction of their total. Annals of Human Genetics (London),11(1), 47–52.
MathSciNet MATH Google Scholar
CompSysTech, International Conference on Computer Systems and Technologies. http://www.compsystech.org/
Conry, D., Koren, Y., & Ramakrishnan, N. (2009). Recommender systems for the conference paper assignment problem. In Proceedings of the 3rd ACM conference on recommender systems (pp. 357–360).
Cormen, T. H., Leiserson, C., Rivest, R., & Stein, C. (2001). Introduction to algorithms (2nd ed.). Cambridge: MIT Press.
MATH Google Scholar
CyberChair, A Web-based Paper Submission & Review System. http://www.borbala.com/cyberchair/
Dice, Lee R. (1945). Measures of the amount of ecologic association between species. Ecology,26(3), 297–302. https://doi.org/10.2307/1932409.JSTOR1932409.
Article Google Scholar
Dinic, E. A. (1970). Algorithm for solution of a problem of maximum flow in a network with power estimation. Soviet Mathematics. Doklady,11(5), 1277–1280.
Google Scholar
EasyChair. Conference management system. http://www.easychair.org/
EDAS: Editor’s Assistant, conference management system. http://edas.info/
Edmonds, J., & Karp, R. M. (1972). Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM (Association for Computing Machinery),19(2), 248–264. https://doi.org/10.1145/321694.321699.
Article MATH Google Scholar
Ferilli, S., Di Mauro, N., Basile, T. M. A., Esposito, F., & Biba, M. (2006). Automatic topics identification for reviewer assignment. In 19th international conference on industrial, engineering and other applications of applied intelligent systems, IEA/AIE 2006 (pp. 721–730). Springer LNCS.
Halevi, S. Web Submission and Review Software. http://people.csail.mit.edu/shaih/websubrev/
Hermann, M., Professor in Algorithms and Complexity. http://www.lix.polytechnique.fr/~hermann/
Jaccard, Paul. (1912). The Distribution of the flora in the alpine zone. New Phytologist,11, 37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x.
Article Google Scholar
Kalinov, K. (2002). Practical statistics for social sciences, archeologists and anthropologists. Sofia: New Bulgarian University. (in Bulgarian).
Google Scholar
Kalmukov, Y. (2006). An algorithm for automatic assignment of reviewers to papers. In Proceedings of the international conference on computer systems and technologies CompSysTech’06, Ruse (pp. V.5-1–V.5-7).
Kalmukov, Y. (2012). Describing papers and reviewers’ competences by taxonomy of keywords. Computer Science and Information Systems,9(2), 763–789.
Article Google Scholar
Kou, N. M., Leong Hou, U., Mamoulis, N., & Gong, Z. (2015). Weighted coverage based reviewer assignment. In Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 2031–2046). 2015.
Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly,2, 83–97.
Article MathSciNet Google Scholar
Lawler, E. L. (1976). Combinatorial optimization: Networks and matroids. New York: Holt, Rinehart, Winston.
MATH Google Scholar
Li, Xinlian, & Watanabe, Toyohide. (2013). Automatic paper-to-reviewer assignment, based on the matching degree of the reviewers. Procedia Computer Science,22, 633–642.
Article Google Scholar
Liu, X., Suel, T., & Memon, N. (2014). A robust model for paper reviewer assignment. In Proceedings of the 8th ACM conference on recommender systems (pp. 25–32)
Lomax, R. G. (2007). Statistical concepts: A second course (p. 10). ISBN 0-8058-5850-4
Long, C., Chi-Wing Wong, R., Peng, Y., & Ye, L. (2013). On good and fair paper-reviewer assignment. In 2013 IEEE 13th international conference on data mining (pp. 1145–1150). IEEE.
Lowik, P. (2009). Comparative analysis between PHP’s native sort function and quicksort implementation in PHP. http://stackoverflow.com/a/1282757. August 2009.
Matsumoto, M., & Nishimura, T. (1998). Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS),8(1), 3–30.
Article Google Scholar
MathWorks. MATLAB—Statistic Toolbox. http://www.mathworks.com/products/statistics/
Microsoft Conference Management Toolkit. https://cmt3.research.microsoft.com/About
Mitkov, A. (2010). Theory of the experiment. “Library for PhD Students” Series. Ruse, 2010, ISBN: 978-954-712-474-5 (in Bulgarian)
Mitkov, A., & Minkov, D. (1993). Methods for statistical analysis and optimization of agriculture machinery—2-nd part. Sofia: Zemizdat Publishing House. (in Bulgarian). ISBN 954-05-0253-5.
Google Scholar
Munkres, J. (1957). Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics,5(1), 32–38.
Article MathSciNet Google Scholar
Nguyen, J., Sánchez-Hernández, G., Agell, N., Rovira, X., & Angulo, C. (2018). A decision support tool using Order Weighted Averaging for conference review assignment. Pattern Recognition Letters,105, 114–120.
Article Google Scholar
OpenConf Conference Management System. http://www.openconf.com/
Pesenhofer, A., Mayer, R., & Rauber, A. (2006). Improving Scientific Conferences by enhancing Conference Management System with information mining capabilities. In: Proceedings IEEE International Conference on Digital Information Management (ICDIM 2006) (pp. 359–366), ISBN: 1-4244-0682-x; S.
Price, Simon, & Flach, Peter A. (2017). Computational support for academic peer review: A perspective from artificial intelligence. Communications of the ACM,60(3), 70–79.
Article Google Scholar
Rigaux, P. http://deptinfo.cnam.fr/~rigaux/
Rigaux, P. (2004). An iterative rating method: Application to web-based conference management. In Proceedings of the 2004 ACM Symposium on Applied Computing (SAC’04) (pp. 1682–1687). ACM Press, NY, ISBN 1-58113-812-1.
Rodriguez, M., & Bollen, J. (2008). An algorithm to determine peer-reviewers. In Conference on information and knowledge management (CIKM 2008) (pp. 319–328). ACM Press.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2012). The author-topic model for authors and documents. arXiv preprint arXiv:1207.4169.
Taylor, C. J. (2008). On the optimal assignment of conference papers to reviewers
The MyReview System (2017) A web-based conference management system. http://myreview.sourceforge.net/. Accessed January 2017 (unavailable now)
van de Stadt, R. (2001). CyberChair: A web-based groupware application to facilitate the paper reviewing process. Available at www.cyberchair.org.
Zhai, C. X., Velivelli, A., & Yu, B. (2004). A cross-collection mixture model for comparative text mining. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 743–748).
Zubarev, D., Devyatkin, D., Sochenkov, I., Tikhomirov, I., & Grigoriev, O. (2019). Expert assignment method based on similar document retrieval. In Elizarov, A., Novikov, B., Stupnikov, S. (Eds.), Data analytics and management in data intensive domains: XXI In-ternational conference DAMDID/RCDL’2019 (October 15–18, 2019, Kazan, Russia): Conference Proceedings (p. 339). Kazan: Kazan Federal University, 2019.

Download references

Funding

There has been no financial support for this work that could have influenced its outcome.

Author information

Authors and Affiliations

Department of Computer Systems and Technologies, University of Ruse, 8 Studentska Str, 7017, Ruse, Bulgaria
Yordan Kalmukov

Authors

Yordan Kalmukov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yordan Kalmukov.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Appendices

Appendix 1: An example

Think of a hypothetical conference. Let’s assume there are 5 submitted papers and 5 registered reviewers. Each paper should be evaluated by 2 reviewers, so every reviewer should evaluate exactly (5 * 2)/5 = 2.0 papers.

Let the similarity matrix is as follows. Rows represent papers and columns represent reviewers.

At the beginning the algorithm sorts each row by the similarity factor in descending order. As a result, the first column suggests the most competent reviewer to every single paper.

The first column suggests that the reviewer r₁ should be assigned to 4 papers (p₁, p₂, p₄ and p₅). However, the maximum allowed number of papers per reviewer is 2, i.e. nobody should review more than two papers. So the algorithm has to decide which 2 of these 4 papers to assign to r₁. At a glance it seems logical that r₁ should be assigned to p₁ and p₂, as they have the highest similarity factors with him/her. On the other hand, there are fewer reviewers competent to evaluate p₄ and p₅ so they should be processed with priority. If r₁ has to be assigned just to p₄ or p₅ which one is more suitable? One may say p₄ as it has higher similarity factor. However, the second-suggested reviewer of p₄ is almost as competent as r₁, while the second-suggested reviewer of p₅ is much less competent in it than r₁ is. In this case it is better to assign r₁ to p₅ rather than to p₄. If r₁ is assigned to p₄, then p₅ will be evaluated by less competent reviewers only, a situation that is highly undesirable. So when deciding which papers to assign to a specific reviewer, the algorithm should take into account both the number of competent reviewers for each paper as well as the rate of decrease in the competence of the next-suggested reviewers for those papers. To automate the process the algorithm modifies the similarity factors from the first column by adding two corrections—C₁ and C₂. They are calculated by formulas 3 and 4. C₁ takes into account the number of non-zero similarity factors with p_i (i.e. the number of reviewers competent to evaluate p_i), while C₂ depends on the rate of decrease in the competence of the next-suggested reviewers for p_i.

The specific values of C₁ и C₂ are as follows:

C₁(p₁, r₁) = 0.0625	C₂(p₁, r₁) = 2 * 0.05 = 0.1
C₁(p₂, r₁) = 0.0625	C₂(p₂, r₁) = 2 * 0.09 = 0.18
C₁(p₃, r₅) = 0.0625	C₂(p₃, r₅) = 2 * 0.03 = 0.06
C₁(p₄, r₁) = 0.25	C₂(p₄, r₁) = 2 * 0.03 = 0.06
C₁(p₅, r₁) = 0.25	C₂(p₅, r₁) = 2 * 0.17 = 0.34

To preserve the real weight of matching, similarity factors should be modified in an auxiliary data structure (an ordinary array) rather than the matrix itself. Here is the first column stored in a single-dimension array.

$$ \left[ {\begin{array}{*{20}c} p_{1} & p_{2} & p_{3} & p_{4} & p_{5} \\ r_{1} = > 0.60 & r_{1} = > 0.89 &r_{5} = > 0.60 &r_{1} = > 0.53 &r_{1} = > 0.50 \end{array} } \right] $$

After adding C₁ и C₂ the first column of the matrix will look like:

$$ \left[ {\begin{array}{*{20}c} p_{1} & p_{2} & p_{3} & p_{4} & p_{5} \\ r_{1} = > 0.76 & r_{1} = > 1.13 & r_{5} = > 0.72 & r_{1} = > 0.84 & r_{1} = > 1.09 \end{array} } \right] $$

As the number of papers per reviewer is 2 then r₁ is assigned to those 2 papers which have the highest similarity factors with him/her after modification. These are p₂ and p₅.

Rows corresponding to p₁ and p₄ in the similarity matrix are shifted one position to the left so that the next-competent reviewers are suggested to these papers. As reviewer r₁ has already got the maximum allowed number of papers to review, he/she is considered to be busy and no more papers should be assigned to him/her in future. Thus all similarity factors, outside the first column, between r₁ and all papers are deleted from the matrix. Deletion guarantees that he/she will not be assign to any more papers.

After shifting p₁ and p₄ one position left and deleting all occurrences of r₁ outside the first column of the matrix, it will look like:

Now r₁ is suggested not to 4 but just to 2 papers. However, after all operations performed above, r₅ is now suggested to 3 papers. To decide which 2 of these 3 papers to assign to r₅, the algorithm again modifies the similarity factors taken from the newly-formed first column of the matrix. As in the previous step, this is done by using formulas 3 and 4. After modification the first column will look like:

$$ \left[ {\begin{array}{*{20}c} p_{1} & p_{2} & p_{3} & p_{4} & p_{5} \\ r_{5} = > 0.70 & r_{1} = > 1.13 &r_{5} = > 0.77 &r_{5} = > 1.70 & r_{1} = > 1.09 \end{array} } \right] $$

It should be assigned to those two papers which have the highest similarity factors with him/her. These are p₃ and p₄. The row corresponding to p₁ is shifted one position to the left again, so that the next-competent reviewer is suggested to that paper. As r₅ already has 2 papers to review, all similarity factors outside the first column that are associated with him/her are deleted from the matrix.

Therefore, the matrix looks like:

As seen in the first column, no reviewer is suggested to evaluate more than 2 papers. So it is now possible to assign all reviewers from the first column directly to the papers which they are suggested to. If the last matrix is compared to the initial one, it could be spotted that 3 of 5 papers (p₂, p₃ and p₅) are assigned to their most competent reviewers. One (p₄) has got its second-competent reviewer and another one (p₁) its third-competent reviewer. However, the levels of competence of these reviewers in respect to p₄ and p₁ are very close to the levels of the most competent reviewers for these papers. r₅ is assigned to p₄ with a similarity factor of 0.50 while the most competent reviewer for p₄ has a similarity factor of 0.53.

“Appendix 2”: Detailed pseudo code

The detailed pseudo code here could be directly translated in any high-level imperative programming language as each operation in the pseudo corresponds to an operator or a built-in function in the chosen programming language. Here is the meaning of the complex data structures (mostly arrays) used within the code:

SM[i,j]—similarity matrix—bi-dimensional array of arrays, where rows (i) represent papers and columns (j) represent reviewers. Each element contains an associative array of two elements—revUser (the user id of the reviewer who is suggested to evaluate paper i) and weight (the similarity factor between paper i and reviewer revUser).
papersOfReviewer[]—an associative array, whose keys corresponds to the usernames (revUser) of the reviewers who appear in the first column of the similarity matrix; and values containing arrays of two elements—id of the paper, being suggested to this reviewer; and the similarity factor between the paper and the reviewer.

For example:

rowsToShift[]—an array containing the row ids (these are actually the paper ids) that should be shifted one position to the left, so that the next-competent reviewer is suggested to this paper.
signifficantSF[paperId]—an array holding the number of significant, non-zero, similarity factors for every paper, identified by its paperId.
reviewersToRemove[]—an array holding the identifiers (revUser) of the reviewers who are already busy (i.e. have enough papers to review) and should be reviewed from the similarity matrix (except its first row) on the next pass through the outermost do-while cycle, so that they are not assigned to any more papers.
busyReviewers[]—an array holding the identifiers of the reviewers who are already busy (i.e. have enough papers to review). This is similar to reviewersToRemove with one major difference—busyReviewers keeps identifiers all the time, while reviewersToRemove is cleared on each pass after deleting the respective reviewers (similarity factors) from the similarity matrix.
maxPapersToAssign[j]—an array holding the maximum number of papers that could be assigned to every reviewer j, identified by its revUser.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalmukov, Y. An algorithm for automatic assignment of reviewers to papers. Scientometrics 124, 1811–1850 (2020). https://doi.org/10.1007/s11192-020-03519-0

Download citation

Received: 14 March 2019
Published: 17 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11192-020-03519-0

Keywords

Mathematical Subject Classification

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An algorithm for automatic assignment of reviewers to papers

Abstract

Access this article

Similar content being viewed by others

Quantifying the quality of peer reviewers through Zipf’s law

A Survey of Methods for Improving Review Quality

STRATEGY: a tool for the formulation of peer-review strategies

References

Funding