Improved reviewer assignment based on both word and semantic features

Tan, Shicheng; Duan, Zhen; Zhao, Shu; Chen, Jie; Zhang, Yanping

doi:10.1007/s10791-021-09390-8

Improved reviewer assignment based on both word and semantic features

Published: 02 April 2021

Volume 24, pages 175–204, (2021)
Cite this article

Download PDF

Information Retrieval Journal Aims and scope Submit manuscript

Improved reviewer assignment based on both word and semantic features

Download PDF

Shicheng Tan^1,2,3,
Zhen Duan^1,2,3,
Shu Zhao^1,2,3,
Jie Chen^1,2,3 &
…
Yanping Zhang^1,2,3

1271 Accesses
14 Citations
Explore all metrics

Abstract

Assigning appropriate reviewers to a manuscript from a pool of candidate reviewers is a common challenge in the academic community. Current word- and semantic-based approaches treat the reviewer assignment problem (RAP) as an information retrieval problem but do not take into account two constraints of the RAP: incompleteness of the reviewer data and interference from nonmanuscript-related papers. In this paper, a word and semantic-based iterative model (WSIM) is proposed to account for the constraints of the RAP by improving the similarity calculations between reviewers and manuscripts. First, we use the improved language model and topic model to extract word features and semantic features to represent reviewers and manuscripts. Second, we use a similarity metric based on the normalized discounted cumulative gain (NDCG) to measure semantic similarity. This metric ignores the probability value (quantitative exact value) of the topic and considers only the ranking (qualitative relevance), thus reducing overfitting to incomplete reviewer data. Finally, we use an iterative model to reduce the interference from nonmanuscript-related papers in the reviewer data. This approach considers the similarity between the manuscript and each of the reviewer’s papers. We evaluate the proposed WSIM on two real datasets and compare its performance to that of seven existing methods. The experimental results show that the WSIM improves the recommendation accuracy by at least 2.5% on the top 20.

A Recommender Engine for Scientific Paper Peer-Reviewing System

An efficient ontology-based topic-specific article recommendation model for best-fit reviewers

Article 15 November 2019

Expertise Computation for Automatic Reviewer Assignment

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Paper-reviewer recommendation refers to the automated process for selecting candidates to perform peer review. This enables journal and conference committees to match papers quickly and accurately to reviewers (McGlinchey et al. 2019). Manually conducting this pairing is labor intensive. Furthermore, it is difficult for a nonprofessional chair to assign suitable matches. Many reviewer assignment systems exist to automate this process (e.g., the Toronto paper matching system (Charlin and Zemel 2013), SubSift (Flach et al. 2010), the Microsoft conference management toolkit^{Footnote 1}, the global review assignment processing engine (GRAPE) (Di Mauro et al. 2005), Erie (Li and Hou 2016), advanced reviewer assignment system (Kou et al. 2015b), and decision support system (Hoang et al. 2019)). These systems are completely automated and have been used for many real conferences (e.g., NIPS, ICML, CVPR, ICCV, and ACML).

The problem of paper-reviewer recommendation is known as the reviewer assignment problem (RAP) (Tayal et al. 2014). Dumais and Nielsen (1992) was the first paper to address this problem. The author treated the RAP as an information retrieval issue and used the latent semantic indexing (LSI) model, to establish the relationship between the reviewer and the paper. With the development of the topic model, Mimno and McCallum (2007) used the more advanced latent Dirichlet allocation (LDA) model and author-topic (AT) model and proposed an author-persona-topic (APT) model to better represent the topics covered by a reviewer. These methods are based on semantic information. To further mine the features of the reviewers and papers, some people used word-based information. Peng et al. (2017) used the term frequency-inverse document frequency (TF-IDF) to mine the statistical characteristics of reviewers and papers. They combined this approach with the topic model to propose the time-aware and topic-based (TATB) model. However, these methods neglect the constraints of the RAP: incompleteness of the reviewer data and interference from nonmanuscript-related papers in the reviewer data. We present these two challenges and their corresponding solutions below.

1.1 Incompleteness of the reviewer data

It is not practical to obtain accurate and latest full-text papers of all reviewers because data collection and processing are difficult, and there are even multilingual data. We usually only take the titles and abstracts of the reviewers’ papers as reviewer data. Incomplete reviewer data have difficulty accurately and quantitatively reflecting the field (topic) of the reviewer’s expertise. To resolve this problem, we use a ranking-based approach to turn the topic distribution into an ordered sequence so that the quantitative probability value of the topic can be ignored, thereby reducing the influence of inaccurate probability values of topics and reducing overfitting to incomplete reviewer data. We first obtain the reviewer and target manuscript topics using the topic model. For the ranking-based approach, we use the normalized discounted cumulative gain (NDCG) as a similarity metric to compute the semantic similarity between the reviewer and the manuscript.

1.2 The interference from nonmanuscript-related papers

When assigning reviewers, we focus on the authors (reviewers) of papers that are highly similar to the manuscript but do not focus on whether this author has published many papers that are not related to the manuscript. In contrast, when calculating full-text similarity, we focus on documents with paragraphs that are highly similar to the query term but also on documents with many other nonsimilar paragraphs. Therefore, when calculating the similarity between the reviewers’ papers as a whole and the manuscript, a large number of irrelevant papers may excessively reduce the similarity between the reviewers and the manuscripts. To resolve this problem, we calculate the similarity between each reviewer’s paper and the manuscript, thus highlighting the importance of papers that are highly similar to the manuscript. We also measure the impact of low-similarity papers by calculating the similarity of the manuscript and all papers of the reviewer. This is because it is difficult to directly weigh the impact of low-similarity papers, e.g., how many low-similarity papers can be equivalent to one high-similarity paper? Finally, we combine these two factors in an iterative way.

Our contributions in this paper are summarized as follows.

(1)
We propose a word and semantic-based iterative model (WSIM) that considers for the first time the constraints of the reviewer assignment problem by improving the metrics between reviewers and manuscripts.
(2)
We use the NDCG as the similarity metric to compute the semantic similarity of the topic. This approach ignores the probability value (quantitative exact value) of the topic and considers only the ranking (qualitative relevance), thus reducing overfitting to incomplete reviewer data.
(3)
We use an iterative model to reduce the interference in the assignment from nonmanuscript-related papers in the reviewer data. This approach considers the similarity between the manuscript and each of the reviewer’s papers, thus reducing the importance of nonmanuscript-related papers in the reviewer data.
(4)
We perform experiments through two datasets, six metrics, and seven comparison algorithms to show that our model is effective in overcoming the challenges.

This paper is organized as follows. Section 2 describes the related research. Section 3 provides the problem formulation, explains our proposed model, describes the model learning algorithm, and introduces the applications of the model. Section 4 describes the experimental setup, the comparison methods, and the performance results. Finally, Sect. 5 concludes this paper.

2 Related work

The authors of Dumais and Nielsen (1992) were the first to discuss automated reviewer recommendations, acknowledging the importance of this task for journal editors as well as the drawbacks of manual assignment. This problem has many names including the conference paper assignment problem (CPAP) (Goldsmith and Sloan 2007), RAP (Wang et al. 2008), paper-reviewer assignment (PRA) (Long et al. 2013), and reviewer assignment (RA) (Wang et al. 2013). Dumais and Nielsen (1992) divided the problem into two processes: selecting the most suitable reviewers for a manuscript and determining the most suitable reviewers for many manuscripts in the case of restrictions. The former is termed retrieval-based RAP (RRAP) (Kou et al. 2015a), while the latter can be termed constrained multiaspect committee review assignment (CMACRA) (Karimzadehgan and Zhai 2009), assignment-based RAP (ARAP) (Kou et al. 2015a) or the multiagent resource allocation problem (MARA) (Lian 2018).

We use the terms RRAP and ARAP as in (Kou et al. 2015a). The ARAP focuses more on optimization issues (Yeşilçimen and Yıldırım 2019) (e.g., how many relevant manuscripts need to be assigned to each reviewer to achieve global optimization?). This paper focuses on RRAP, which can be divided into three categories of related methods: based on semantic information, based on word information, and based on other information. The semantic information corresponds to some relationship between words, typically described by the topic. Word information is used to define the relationship between the reviewer and the manuscript through statistical word frequency and other information. In addition to semantic information and word information, nontextual information can be used to calculate similarity, including classification information (Zhang et al. 2020a; Liu et al. 2016) pertaining to the paper and information provided by the reviewers (Rigaux 2004; Di Mauro et al. 2005). A rule-based (Di Mauro et al. 2005), collaborative filtering (Rigaux 2004) or network-based (Fair and accurate reviewer assignment in peer review 2019; Xu et al. 2019; Anaya et al. 2019) method is often used for this type of information. We focus on methods based on semantics, words, and a combination of these types of information.

2.1 Semantic-based approach

Dumais and Nielsen (1992) transformed RRAP into a retrieval problem using latent semantic indexing (LSI) to extract semantic information and used cosine similarity to calculate the similarity between the reviewer and the manuscript. LSI is a common method for extracting topic information, and Ferilli et al. (2006) and Li and Hou (2016) also used this method. pLSA (Karimzadehgan and Zhai 2012, 2009) and LDA (including variants) (Charlin and Zemel 2013; Misale and Vanwari 2017; Kim and Lee 2018) are improved methods for extracting topic information. Karimzadehgan et al. (2008) first used the pLSA model to obtain the topic and calculate the similarity between the reviewer and manuscript. Mimno and McCallum (2007) first used the LDA model to extract semantic information and proposed the APT model to improve LDA with respect to describing textual information from reviewers and manuscripts. Li and Watanabe (2013) combined the APT model with a time factor to measure the degree of expertise of reviewers. Based on this, Peng et al. (2017) employed TF-IDF to consider word information. Kou et al. (2015a) used the topic weighted coverage calculation based on the topic feature of LDA and proposed the branch-and-bound algorithm (BBA) to find reviewers in the fastest time. In addition to the topic model, Ogunleye et al. (2017) used word2vec to calculate similarity. Zhao et al. (2018) transformed the RRAP into a classification problem and used the word mover’s distance (WMD) method to calculate similarity and then used the constructive covering algorithm (CCA) to simultaneously classify reviewers and manuscripts. In (Zhang et al. 2020b), RRAP was also cast as a multilabel classification task in which the reviewers were assigned according to multiple predicted labels.

2.2 Word-based approach

The most commonly used word-based methods are keyword matching (Sidiropoulos and Tsakonas 2015; Protasiewicz et al. 2016; Shon et al. 2017; Dung et al. 2017), TF-IDF (Hettich and Pazzani 2006; Flach et al. 2010; Peng et al. 2017), and the language model (LM) (Mimno and McCallum 2007; Tang et al. 2010; Charlin et al. 2012). Tang and Zhang (2008) calculated the similarity between reviewers and manuscripts by constructing a keyword network and using cosine similarity for keyword matching. Protasiewicz (2014) added publication time information to calculate keyword weights. Dung et al. (2017) improved the keyword matching results by improving the Knuth-Morris-Pratt (KMP) algorithm. Yarowsky and Florian (1999) first used TF-IDF and cosine similarity to calculate the similarity between the reviewer and manuscript. Basu et al. (2001) used a TF-IDF-based information integration system (WHIRL) combined with collaborative filtering. They obtained the recommendation source matrix using the scores retrieved by WHIRL. Biswas and Humayun (2007) mapped keywords to topics based on TF-IDF, which combines ontology-driven inferences. Protasiewicz et al. (2016) directly retrieved relevant reviewers using a full-text index based on TF-IDF. Charlin and Zemel (2013) used LM as the similarity calculation method for the Toronto paper matching system.

2.3 Approach combining semantic and word information

Few existing methods simultaneously consider the semantic and word information of reviewers and manuscripts to capture the semantic and word similarity between a reviewer and a manuscript. Tang et al. (2010, 2012) were the first to combine the language model and LDA to calculate the similarity between reviewers and manuscripts. Peng et al. (2017) used term frequency-inverse document frequency (TF-IDF) to mine the word information of reviewers and papers. They combined this approach with the topic model to propose the time-aware and topic-based (TATB) model.

These semantic-based or word-based approaches treat the reviewer assignment problem as an information retrieval problem but do not take into account the constraints of the reviewer assignment problem. Hence, we propose a WSIM based on LDA and LM to account for the constraints of the reviewer assignment problem by improving the similarity calculations between reviewers and manuscripts.

3 Proposed model

In this section, we first formulate the reviewer assignment problem and notation used in this paper. Then, we describe the word and semantic information extraction. Finally, we detail the ranking-based approach and iterative model for considering the constraints of the reviewer assignment problem.

3.1 Problem definition and notation

First, we define our terms in a formal way. We define a set of reviewer papers $\mathbf {D}=\{d_1,d_2,...,d_{|\mathbf {D}|}\}$ and a set of manuscripts $\mathbf {P}=\{p_1,p_2,...,p_{|\mathbf {P}|}\}$, where $d_i$ and $p_i$ denote the text information (e.g., title, abstract, etc.) of the reviewer’s paper and manuscript, respectively. We define a set of reviewers $\mathbf {R}=\{r_1,r_2,...,r_{|\mathbf {R}|}\}$, where $r_i$ denotes the text information (composed of $d_j\in \mathbf {D}$) of the reviewer.

Then, we define our problem in a formal way. Given three sets $\mathbf {D},\mathbf {P},\mathbf {R}$ and topN (the number of reviewers required for each manuscript), our goal is to obtain the most suitable topN reviewers (a subset of $\mathbf {R}$) for each manuscript $p_i\in \mathbf {P}$.

The definition of the retrieval-based RAP (RRAP) is given above. We solve this problem based on two characteristics of reviewer data. In the next subsection, we will begin to describe the proposed word and semantic-based iterative model (WSIM) for the reviewer assignment problem. Table 1 lists the notation used in the proposed model.

Table 1 Notations

Full size table

3.2 Feature extraction

To calculate the similarity between the reviewer and the manuscript, we need to obtain the semantic and word features of the reviewer and the manuscript. The semantic feature captures the word cooccurrence information between the topics, and the word feature captures the word cooccurrence information between the documents. These two different levels of information make the similarity calculation more comprehensive.

3.2.1 Semantic features

We use the topic model (LDA) to demonstrate the use of the semantic information corresponding to the reviewer publications and the manuscript text. LDA assumes that the text contains multiple topics, following the unigram hypothesis, and obtains the topics of a text by using Gibbs sampling. We use LDA on each reviewer’s textual information to obtain the reviewer-topic distribution $\theta _{mat}$:

$$\begin{aligned} \theta _{mat}\;= & {} \{\theta _1,...,\theta _{|{\mathbf {R}}|}\} \nonumber \\ \theta _m\;= & {} \{\theta _{m,1},...,\theta _{m,K}\},\quad 1\leqslant m\leqslant |{\mathbf {R}}| \nonumber \\ \theta _{m,i}\;= & {} \frac{n_{m,i}+\alpha }{\sum _{j=1}^{K}{(n_{m,j}+\alpha )}},\quad 1\leqslant i\leqslant K \end{aligned}$$

(1)

where K denotes the number of topics, $n_{m,i}$ denotes the occurrence of the ith topic within the topics covered by reviewer $r_m$, as obtained by Gibbs sampling, and $\alpha $ denotes the hyperparameter of the LDA model. After obtaining the reviewer-topic distribution, we can predict the manuscript-topic distribution $\theta _{\mathbf {P}}=\{\theta _{p_1},...,\theta _{p_{|\mathbf {P}|}}\}$, where $\theta _{p_m}$ denotes the polynomial topic distribution of manuscript $p_m$. The topic-word distribution $\varphi _{mat}=\{\varphi _,...,\varphi _{K}\}$ is similar to $\theta _{mat}$, while m, K, and $\alpha $ are replaced with k, V, and $\beta $.

For consistency, the topics of each reviewer are directly represented by the topics of the reviewer’s papers. This method requires a separate representation of the reviewer’s papers. According to the reviewer-topic model, the paper-topic distribution $\rho _{mat}$ is expressed as Eq. (2):

$$\begin{aligned} \begin{aligned} \rho _{mat}\;&=\{\rho _1,...,\rho _{|\mathbf {D}|}\} \\ \rho _m\;&=\{\rho _{m,1},...,\rho _{m,K}\},\quad 1\leqslant m\leqslant |\mathbf {D}| \\ \rho _{m,i}\;&=\frac{n_{m,i}+\alpha }{\sum _{j=1}^{K}{(n_{m,j}+\alpha )}},\quad 1\leqslant i\leqslant K \\ \end{aligned} \end{aligned}$$

(2)

where $n_{m,i}$ denotes the occurrence of the ith topic in the mth reviewer’s paper $\rho _{m,i}$.

Thus, we represent the semantic features of the textual information using the topic distribution $\theta _{mat},\theta _{\mathbf {P}},\rho _{mat}$.

3.2.2 Word features

We use the language model to demonstrate the representation of the word information. In the language model, the relevance between a query word w and a paper $d_i$ can be expressed as the probability of generating $P_{LM}(w|d_i)$ or $P_{ LM}(w|r_i)$, as follows:

$$\begin{aligned} \begin{aligned} P_{LM}(w|d_i)=\frac{N_{d_i}}{N_{d_i}+\lambda }\cdot \frac{tf(w,d_i)}{N_{d_i}}+(1-\frac{N_{d_i}}{N_{d_i}+\lambda })\cdot \frac{tf(w,\mathbf {D})}{N_\mathbf {D}} \\ P_{LM}(w|r_i)=\frac{N_{r_i}}{N_{r_i}+\lambda }\cdot \frac{tf(w,r_i)}{N_{r_i}}+(1-\frac{N_{r_i}}{N_{r_i}+\lambda })\cdot \frac{tf(w,\mathbf {R})}{N_\mathbf {R}} \end{aligned} \end{aligned}$$

(3)

where $N_{d_i}$ denotes the length of paper $d_i$, $\lambda $ denotes the average length across all of the papers, $tf(w,d_i)$ denotes the number of times word w appears in paper $d_i$, $tf(w,\mathbf {D})$ denotes the number of times word w appears in all papers $\mathbf {D}$, and $N_\mathbf {D}$ denotes the total length of all of the papers. The parameters in $P_{LM}(w|r_i)$ are analogous.

The query term w is derived from any manuscript $p_k$. To effectively capture the importance of certain low-frequency words and reduce the weight of insignificant high-frequency words, we extract the word collection $\mathbf {p}_k$ without considering the repeated words in the manuscript $p_k$. Different manuscripts contain different numbers of words, potentially causing an order of magnitude difference in the results for manuscripts of different lengths in the language model. To solve this problem, we sorted the words in manuscript $p_k$ to obtain the collection of the first t words $\mathbf {W}_{p_k}$, resulting in manuscripts of equal length. This process is described in Eq. (4):

$$\begin{aligned} \begin{aligned} \mathop {\arg \max }_{\mathbf {W}_{p_k}}\quad&P_{LM}(w_t|d_i)\\ where\quad&\mathbf {W}_{p_k}=\{w_1,w_2,...,w_t\}\subseteq \mathbf {p}_k,\; d_i\in \mathbf {D}\\ s.t.\quad&\forall w_j\in \mathbf {W}_{p_k}\\ \qquad&\Rightarrow P_{LM}(w_j|d_i)\geqslant P_{LM}(w_{j+1}|d_i) \end{aligned} \end{aligned}$$

(4)

Finally, we obtain the word-based similarity $LM(p_k,d_i)$ between manuscript $p_k$ and paper $d_i$:

$$\begin{aligned} LM(d_i,p_k)=\prod _{w_j\in \mathbf {W}_{p_k}}{P_{LM}(w_j|d_i)} \end{aligned}$$

(5)

Thus, we represent the word features of the textual information using an improved language model.

3.3 Ranking-based approach and iterative model

After obtaining the features of the reviewers and the manuscript, we detail the ranking-based approach and iterative model for considering the constraints of the reviewer assignment problem.

3.3.1 Ranking-based approach

To reduce the influence of inaccurate probability values of topics, we use the NDCG as the similarity metric to turn the topic distribution into an ordered sequence so that the quantitative probability value of the topic can be ignored, thereby reducing the influence of inaccurate probability values of topics. This approach ignores the probability value (quantitative exact value) of the topic and considers only the ranking (qualitative relevance), thus reducing overfitting to incomplete reviewer data.

The NDCG similarity between reviewer r and manuscript p is expressed as ${\text {NDCG}}_K(r,p)$ using $\theta _{mat}$ and $\theta _{\mathbf {P}}$. ${\text {NDCG}}_K(r,p)$ must be normalized to calculate the topic similarity. A topic’s NDCG (tNDCG) similarity is expressed as Eq. (6), and ${\text {tNDCG}}_K(r,d)$ is analogous.

$$\begin{aligned} \begin{aligned} {\text {tNDCG}}_K(r,p)&=\frac{{\text {NDCG}}_K(r,p)-\frac{{\text {bDCG}}_K}{{\text {iDCG}}_K}}{1-\frac{{\text {bDCG}}_K}{{\text {iDCG}}_K}}\\ {\text {NDCG}}_K(r,p)&=\frac{{\text {DCG}}_K(r,p)}{{\text {iDCG}}_K} \\ \end{aligned} \end{aligned}$$

(6)

where ${\text {iDCG}}_K$, ${\text {bDCG}}_K$, and ${\text {DCG}}_K(r,p)$ are further defined as:

$$\begin{aligned} \begin{aligned} {\text {iDCG}}_K&=\sum _{i=1}^{K}{\frac{y(i)}{\log _2(i+1)}} \\ {\text {bDCG}}_K&=\sum _{i=1}^{K}{\frac{K-i+1}{\log _2(i+1)}}\\ {\text {DCG}}_K(r,p)&=\sum _{i=1}^{K}{\frac{y\big (rank[x(\theta _r),i,x(\theta _p)]\big )}{\log _2(i+1)}}\\ where\quad&x(\theta _r)=\{k_1,...,k_K\},\theta _r\in \theta _{mat},\theta _p\in \theta _{\mathbf {P}} \\ s.t.\quad&\forall i\in [1,K-1]\Rightarrow \theta _{r,k_i}\geqslant \theta _{r,k_{i+1}},\;\theta _{r,k_i}\in \theta _r \end{aligned} \end{aligned}$$

(7)

where $x(\theta _r)$ denotes the probability ranking order of the topics (in reverse order). $rank[x(\theta _r),i,x(\theta _p)]$ represents the ranking of topic $k_i$ of $x(\theta _p)$ in $x(\theta _r)$. The function y denotes the rank value function and $y(i)=i^{-\frac{1}{2}}$. The ${\text {bDCG}}$ (bad DCG) denotes the lower bound of ${\text {DCG}}_K(r,p)$. The role of ${\text {bDCG}}$ is to achieve the normalization of ${\text {NDCG}}_K(r,p)$.

3.4 Iterative model

To reduce the interference in the assignment from nonmanuscript-related papers in the reviewer data, we calculate the similarity between each reviewer’s paper and the manuscript, thus highlighting the importance of papers that are highly similar to the manuscript. Then, we measure the impact of low-similarity papers by calculating the similarity of the manuscript and all papers of the reviewer. This is because it is difficult to directly weigh the impact of low-similarity papers, e.g., how many low-similarity papers can be equivalent to one high-similarity paper? Finally, we combine these two factors in an iterative way.

When we combine these two factors using an iterative model, the similarity of one reviewer to the manuscript is influenced by the similarity of the manuscript and each paper for that reviewer, and the similarity of one reviewer’s paper to the manuscript is influenced by the similarity of each author (reviewer) to the manuscript. We can describe this with the following formula, Eq. (8):

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} \gamma ^0[r]={\text {tNDCG}}_K(r,p)\cdot LM(r,p)\\ \gamma ^k[r]=(1-\xi _d)\gamma ^{k-1}[r]+\xi _d\cdot \pi ^k[f_{rd}(r)] \end{array}\right. } \\ {\left\{ \begin{array}{ll} \gamma ^0[d]={\text {tNDCG}}_K(d,p)\cdot LM(d,p)\\ \gamma ^k[d]=(1-\xi _r)\gamma ^{k-1}[d]+\xi _r\cdot \pi ^k[f_{dr}(d)] \end{array}\right. } \end{aligned} \end{aligned}$$

(8)

where $\gamma ^k[r]$ denotes the relevance of reviewer r to the manuscript at the kth iteration and $\gamma ^k[d]$ denotes the relevance of the reviewer’s paper d to the manuscript at the kth iteration. Further, $\xi _d$ denotes the iterative weight of the reviewer’s paper, $\xi _r$ denotes the iterative weight of the reviewer, $f_{rd}(r)$ denotes all of the papers of reviewer r, and $f_{dr}(d)$ denotes all of the reviewers of the reviewer’s paper d.

In the above formula, $\pi ^k[f_{rd}(r)]$ is essential. It denotes the relevance of the reviewer’s r paper to the manuscript. Because nonmanuscript-related papers overshadow manuscript-related papers, we highlight the importance of papers that are highly similar to the manuscript by the function $\pi $. By formulating the relevance of reviewer r’s collection of papers as $f_{rd}(r)$, different weights can be assigned to reviewers’ papers with different influences that can distinguish reviewers’ papers of different levels of importance. We determine the ranking $\mu ^k_r$ of the relevance between all of the reviewers’ manuscripts and the target manuscripts for reviewer r: (in the kth iteration)

$$\begin{aligned} \begin{aligned}&\mu ^k_r=\{\gamma ^{k-1}[d_1],...,\gamma ^{k-1}[d_h]\} \\ where\quad&d_i\in f_{rd}(r),\;h=|f_{rd}(r)| \\ s.t.\quad&\forall i\in [1,h-1]\Rightarrow \gamma ^{k-1}[d_i]\geqslant \gamma ^{k-1}[d_{i+1}] \end{aligned} \end{aligned}$$

(9)

Similarly, the ranking $\mu ^k_d$ of the relevance between all of the authors of paper d (reviewers) and manuscripts is represented by Eq. (10):

$$\begin{aligned} \begin{aligned}&\mu ^k_d=\{\gamma ^{k-1}[r_1],...,\gamma ^{k-1}[r_l]\} \\ where\quad&r_i\in f_{dr}(d),\;l=|f_{dr}(d)| \\ s.t.\quad&\forall i\in [1,l-1]\Rightarrow \gamma ^{k-1}[r_i]\geqslant \gamma ^{k-1}[r_{i+1}] \end{aligned} \end{aligned}$$

(10)

(11)

To ensure the stability of the iteration, we must normalize the accumulation of the relevance of all of the reviewer’s papers. The most relevant reviewer paper is assigned a weight of $\eta $, and all of the reviewer’s remaining papers are assigned a weight of $1-\eta -(1-\eta )^h$, which is weighted in a recursive method. Equation (11) shows how the function $\pi $ assigns weights to the papers that are most relevant to the target manuscript. In the kth iteration, the relevance $\pi ^k[f_{rd}(r)]$ of the reviewer’s r paper to the manuscript and the relevance $\pi ^k[f_{dr}(d)]$ of the reviewer’s paper d to the manuscript are expressed as Eq. (12):

$$\begin{aligned} \begin{aligned} \pi ^k[f_{rd}(r)]=\sum _{i=0}^{h-1}\frac{{(1-\eta )^i\eta }}{1-(1-\eta )^h}\mu ^k_{r,i+1} \\ \pi ^k[f_{dr}(d)]=\sum _{i=0}^{l-1}\frac{{(1-\eta )^i\eta }}{1-(1-\eta )^l}\mu ^k_{d,i+1} \end{aligned} \end{aligned}$$

(12)

where $h=|f_{rd}(r)|$ denotes the number of papers authored by reviewer r, $\eta $ denotes the weighting factor, and $l=|f_{dr}(d)|$ denotes the number of authors (reviewers) of the reviewer’s paper d. Further, $\mu ^k_{r,i+1}$ denotes the $(i+1)$-ranked relevance score in $\mu ^k_r$.

Figure 1a depicts the relationship schema between the reviewer and the reviewer’s paper, resulting in $f_{rd}(r)$ and $f_{dr}(d)$. Figure 1b depicts an example of the iterative process based on the relationship schema. In this example, the reviewer’s papers $\{d_1,d_2,d_3\}$ influence reviewer $r_1$ through $\xi _d\cdot \pi ^k[f_{rd}(r_1)]$, and the reviewers $\{r_1,r_2\}$ influence the reviewer’s paper $d_3$ through $\xi _r\cdot \pi ^k[f_{dr}(d_3)]$, all of which together form a coupled random walk.

Reviewers of the same manuscript do not consider rankings. We use an averaging process for the relevance of the reviewer’s papers before iteration so that the papers of the topN reviewers do not consider the ranking. First, we calculate the sorting of $\gamma ^0_d$ for each paper. Then, we evaluate all $\gamma ^0_d$ according to the reviewer’s average number of papers and the number of reviewers to be assigned to the target manuscript, and we average the relevance of the reviewer’s papers. In Algorithm 1 outlined below, we describe the main process of WSIM.

Thus, we use a ranking-based approach and iterative model to consider the constraints of the reviewer assignment problem and to obtain the most suitable topN reviewers for each manuscript.

4 Experiments

In this section, we evaluate the effectiveness of our WSIM method. We construct experiments using closed-world settings (Price and Flach 2017) with a fixed predetermined pool of reviewers to conduct a comparison with seven existing methods.

4.1 Dataset

Typically, journals do not disclose their specific manuscript review process because of fairness and privacy, so it is difficult to obtain a real manuscript review process. This problem makes it difficult to use current real datasets. For example, in paper (Karimzadehgan et al. 2008), their dataset^{Footnote 2} is too small and lacks time. Another example is in paper (Tang et al. 2012), whose dataset^{Footnote 3} lacks reviewer paper information and assignment results. There is also a paper (Kou et al. 2015a), whose datasets^{Footnote 4} lack the allocation results that can be used for evaluation. The paper (Mimno and McCallum 2007) provides a manually assigned dataset for NIPS2006, but it is not publicly available. Therefore, we used two data sources to construct a real dataset. All datasets are released on GitHub^{Footnote 5}.

4.1.1 First dataset

Table 2 describes the dataset in detail. This dataset consists of reviewer profiles, which comprise their publications (including titles, abstracts, and years) and labels. The label indicates the peer review relationship between the target manuscript and the reviewer. It uses a binary value to describe whether the reviewer can review the target manuscript.

Table 2 First dataset

Full size table

We apply a rule to obtain the labels for the field (classification) of reviewers and manuscripts: a reviewer who has published at least 10 papers in a field corresponding to the manuscript is eligible to review that manuscript. In this setup, each reviewer has at least one field that corresponds to at least 10 papers published by that reviewer (whether or not consistent with the target manuscript), which forms the qualification for becoming a candidate reviewer.

4.1.2 Second dataset

This dataset comes from the public data source of arXiv^{Footnote 6}, which contains a total of 1,180,081 papers. All papers contain titles, abstracts, authors, publication time, subject, and 1,031,734 papers without MSC classification. We use the subject as the field. As with the processing of the first dataset, we constrain the information of reviewers and manuscripts during preprocessing. The difference is that reviewers who have published at least 20 papers in a field corresponding to the manuscript are eligible to review that manuscript. Table 3 describes the details of the dataset, and finally, we obtain 1885 reviewers and 685 manuscripts from the second dataset, which simulates a medium-sized conference.

Table 3 Second dataset

Full size table

4.1.3 Validation dataset

To find a common set of hyperparameters, we constructed a validation dataset using the same methods and data sources as the first dataset. Table 4 describes the dataset in detail.

Table 4 Validation dataset

Full size table

4.2 Comparison methods

We compare our WSIM with the following seven methods, which include classic algorithms and state-of-the-art algorithms: LDA (equivalent to the author-topic model) (Mimno and McCallum 2007), LM (Charlin and Zemel 2013), LDA-LM (Tang et al. 2010), TATB (time-aware and topic-based model) (Peng et al. 2017), KCS (keyword cosine similarity) (Protasiewicz et al. 2016), BBA (Kou et al. 2015a), and WMD (Kusner et al. 2015).

LDA This method calculates the cosine similarity of the topic distribution probability between the reviewer and the manuscript to determine the appropriate reviewers for the manuscript.

LM The field of the manuscript is regarded as a query term; the method calculates the probability that the query term is present in the reviewer’s information to obtain the appropriate reviewers for the manuscript (see Eq. (3)).

LDA-LM This approach combines the results of LDA and LM to determine the appropriate reviewers for the manuscript based on the total score.

TATB Based on LDA, the papers published by reviewers are assigned different weights over time and multiplied by the results of TF-IDF to determine the appropriate reviewers based on the resulting scores.

KCS This method uses the Kea algorithm to extract the keywords of the reviewers and target manuscripts, assigns weights to the keywords with respect to the publication time of the paper in which the keyword is located and calculates the cosine similarity between the reviewer and the target manuscript.

BBA This approach uses LDA to obtain the topic distribution of the reviewers and the target manuscripts. The topic distribution of all of the reviewers for a target manuscript is considered as a whole (a group of reviewers), and the branch-and-bound method is used to quickly determine the appropriate reviewers.

WMD This approach uses word2vec to calculate the word embedding of the reviewers and the target manuscripts and then uses earth mover’s distance to calculate the similarity between the text excerpts.

4.2.1 Hyperparameters

We perform a random search (Bergstra and Bengio 2012) in the hyperparameter space using the validation dataset, and the result is as follows. The hyperparameters in the LDA model of the WSIM and comparison methods include the number of fields (topics) K, the hyperparameter $\alpha $, the hyperparameter $\beta $, and the number of iterations, which are set to 50, 0.5, 0.1, and 3000, respectively. The hyperparameters in the WSIM include t, $\eta $, $\xi _d$, and $\xi _r$, which are set to 80, 0.25, 0.05, and 0.05, respectively. WMD uses 300-dimensional word embedding. The hyperparameters used for the other comparison methods are consistent with the respective original papers. Our implementations are available on GitHub^{Footnote 7}.

4.3 Evaluation metrics

We use the methods to find the topN reviewers for each manuscript and compare the result of each method with the labels in the dataset. We use the precision, recall, and F1 score as evaluation metrics. We also employ several popular information retrieval measures (Büttcher and Clarke 2016) including mean averaged precision (MAP), normalized discounted cumulative gain (NDCG), and bpref (Buckley and Voorhees 2004). The metrics are defined below:

$$\begin{aligned} \begin{aligned} \text {Precision:}~\text {P}&=\frac{1}{N}\sum ^{N}_{1}\frac{TP}{TP+FP} \\ \text {Recall:}~\text {R}&=\frac{1}{N}\sum ^{N}_{1}\frac{TP}{TP+FN} \\ \text {Macro-F1~score:}~\text {F}_1&=\frac{2\cdot \text {P}\cdot \text {R}}{\text {P}+\text {R}}\\ \end{aligned} \\ \begin{aligned} \text {Mean Average Precision:}~\text {MAP}&=\frac{1}{N}\sum ^{N}_{1}\frac{1}{R_n}\sum ^{R_n}_{i=1}(\text {P}@i\cdot R(c_i)) \\ \text {NDCG}&=\frac{1}{N}\sum ^{N}_{1}\frac{\sum _{i=1}^n\frac{R(c_i)}{\mathrm{log}_2(i+1)}}{\sum _{i=1}^n\frac{1}{\mathrm{log}_2(i+1)}} \\ \text {Binary preference:}~\text {bpref}&=\frac{1}{N}\sum ^{N}_{1}\frac{1}{R_n}\sum ^{R_n}_{r=1}(1-\frac{\sum _{i=1}^r(1-R(c_i))}{R_n}) \end{aligned} \end{aligned}$$

where $N=|\mathbf {P}|$, TP denotes the number of true positives, FP denotes the number of false positives, and FN denotes the number of false negatives. In addition, $n=topN$, $R_n$ is the number of reviewers who are eligible to review the target manuscript, and $R(c_i)=1$ if the i-th retrieved candidate is relevant to the target manuscript and $R(c_i)=0$ otherwise.

4.4 Experimental results

We examined the performance of the WSIM and comparison methods when topN was 10, 20, 30, and 50. Tables 5 and 6 show the results of the WSIM and the seven comparison methods with respect to the precision, recall, F1 score, MAP, NDCG, and bpref metrics on the first dataset and second dataset. The results indicate that our proposed WSIM is superior to all of the comparison methods, including the latest RRAP method (TATB). This proves our effectiveness in overcoming challenges. The WSIM is better than the three types of methods it is based on, namely, LM (word-based), LDA (semantic-based), and LDA-LM (word and semantic-based). The WSIM also outperforms other methods: KCS (word-based), BBA (semantic-based), TATB (word and semantic-based), and WMD (word embedding or semantic-based). This is because we consider the constraints of RAP, not just RAP, as an information retrieval problem.

Here, some analyses of comparative methods are presented: (1) The performance of LDA is weaker than LM, which is better suited for short texts (title and abstract). (2) The direct combination of LDA and LM does not result in an improvement in performance (Table 5) because this combination is not complementary and may result in incorrect results being adversely affected. (3) TATB uses the TF-IDF method but does not suitably represent the word information. (4) KCS uses keyword weight calculations but does not represent semantic information. (5) BBA uses topic-based coverage to calculate relevance, without considering word information, and this coverage only finds the appropriate reviewer group for the target manuscript and does not ensure that each reviewer is appropriate for the target manuscript. (6) WMD uses semantic information but does not consider the constraints of RAP.

Table 5 Method performance comparison for the first dataset

Full size table

Table 6 Method performance comparison for the second dataset

Full size table

4.5 Ablation analysis

We conduct an ablation analysis on the WSIM to examine the effectiveness of each component, including the improved LM, ranking-based approach and iterative model. First, for improved LM, we list the existing method LM and its improved method improved LM (I-LM). Second, for the ranking-based approach, we list the existing LDA method and its improved LDA-NDCG method. The original LDA method uses cosine similarity, and we further compare Euclidean distance (LDA-ED) and Jensen-Shannon divergence (LDA-JS). Finally, for the iterative model, we list the existing LDA-LM and improved WSIM methods, including the LDA-NDCG+improved LM (LDA-NDCG+I-LM), which is a zero-iterative WSIM. Tables 7 and 8 show the precision of these methods on the first dataset and second dataset, respectively, with topN values of 10, 20, 30, and 50. The underline indicates the best result in the current component. The bold font is the best result in the current column. We have the following observations and analysis:

(1) Iterative models are helpful for performance improvement. The performance of the WSIM exceeds all LDA-IM and 75% of LDA-NDCG+I-LM. This is because the iterative model reduces the interference in the assignment from nonmanuscript-related papers in the reviewer data. (2) The ranking-based approach is helpful for performance improvement. The performance of LDA-NDCG exceeds that of LDA, LDA-ED, and LDA-JS. This is because the ranking-based approach reduces the influence of inaccurate probability values of topics. (3) Improved LM is helpful for performance improvement. I-LM outperforms LM in 75% of the results. This is mainly because I-LM alleviates the problem caused by the inconsistent text length in the LM method.

Table 7 Experimental results of the original methods and their improved versions (first dataset)

Full size table

Table 8 Experimental results of the original methods and their improved versions (second dataset)

Full size table

We explore the influence of the number of iterations on the algorithm performance. Figure 2 shows the precision (topN=20) for different numbers of iterations. Performing zero iterations denotes that the interference in the assignment from nonmanuscript-related papers is not considered. On the first dataset, increasing the number of iterations from one to two results in improved performance because the importance of papers that are highly similar to the manuscript is considered with more iterations. On the second dataset, increasing the number of iterations to one results in the best improved performance. As the number of iterations increases, the proportion of $\pi ^k[f_{rd}(r)]$ in $\gamma ^k[r]$ increases at the same time. $\pi ^k[f_{rd}(r)]$ has highlighted the importance of papers that are highly similar to the manuscript, $\gamma ^0[r]$ has highlighted the importance of all papers by reviewers, and both are indispensable. Therefore, continuing to increase the number of iterations can overstate the importance of papers that are highly similar to the manuscript and degrade the final performance.

4.6 Significance test

In this subsection, we analyze the statistical significance of method performance improvement through a significance test. We randomly divide all manuscripts into ten through tenfold cross-validation and then compare the precision of the WSIM and all comparison methods through the two-sided paired t-test (Smucker et al. 2007). Table 9 shows the mean, t-value, and p-value of precision (topN=20) on the two datasets. The first row is a paired t-test for WSIM and WSIM, and they do not differ. The confidence level of the WSIM over the other methods is at least 97.5% in both datasets. This proves that the performance improvement of the WSIM is statistically significant.

Table 9 Two-sided paired t-test results between the WSIM and other comparison methods

Full size table

For a more comprehensive analysis of the statistical significance between the performance of all methods, we extended the results presented in Table 9 to all methods and all metrics. Table 10 shows the results. The upper part of Table 10 shows whether each method (column) passes the significance test of outperforming each method (row) on six metrics (in the order of P, R, $\text {F}_1$, MAP, NDCG, and bpref). We set the confidence level at 97.5% and record 1 if it passes the significance test at this confidence level; otherwise, we record 0. For example, “101111” means that only the recall R does not pass the significance test. The bottom half of Table 10 shows the lowest value of the confidence level among the six metrics. For example, in the first dataset, the lowest confidence level at which WSIM outperforms LDA+LM is 0.9517, which is the confidence level of recall R, as seen in the upper part of Table 10. We can obtain the performance ranking between different methods: $\text {WSIM}>\text {LDA+LM}\ge \text {LM}\ge \text {WMD}>\text {TATB}\ge \text {LDA}>\text {BBA}\ge \text {KCS}$. From Table 10, we can see that this ranking’s confidence level is at least 95%.

Table 10 Two-sided paired t-test results between all methods

Full size table

4.7 Bias-variance decomposition

In this subsection, we analyze the generalizability of all methods, and we perform a bias-variance decomposition on the precision of each manuscript. We use precision=1.0 as the true output and calculate the bias between it and each method’s precision. The generalization error is equal to the square of the bias plus the variance. Table 11 shows the bias, variance, and generalization error for each method (topN=20) on both datasets. The bias and generalization error of the WSIM are minimal. The variance of the WSIM is almost identical to that of LDA-LM and LM on which it is based. This proves the excellent generalization capability of WSIM, as the WSIM reduces the bias while maintaining the variance.

Table 11 Bias, variance, generalization error (GE) of the WSIM and comparison methods on two datasets

Full size table

To further determine whether the performance is helped by a few manuscripts or many manuscripts, we show the precision bias between each method and the WSIM on each manuscript. The violin plot in Fig. 3 shows the distribution of precision bias. Each violin in the figure shows the precision bias of a method on each manuscript and is labeled with the maximum, minimum, mean, and median of the precision (topN=20). The wider the violin is, the more the manuscripts that are in that position. From the figure, we can observe that (1) the median value of the precision bias for each comparison method is below zero; and (2) the distribution of the precision bias is close to the normal distribution. This proves that our method improves the performance of most manuscripts, and the improvement is close to the normal distribution.

4.8 Hyperparameter analysis

In this subsection, we show the performance of important hyperparameters of the WSIM at different values to investigate the impact of different hyperparameter values on the performance. We use the method of control variables to analyze the four most important parameters ($t,\eta ,\xi _d,\xi _r$) in the WSIM. Table 12 shows the six values used for each hyperparameter, and the values given in Section 4.2 are in bold. When the value of a hyperparameter is a variable, the other parameters will use fixed boldface values. Figure 4 shows the experimental results of the WSIM on four hyperparameters, six hyperparameter values, six metrics, and two datasets. The horizontal coordinates show the hyperparameter values. The vertical coordinate shows the performance at the current hyperparameter value minus the average performance of the six hyperparameter values. From the range of values of vertical coordinates, we can obtain the following conclusions: (1) the influence of hyperparameters on performance can be ordered as $t>\eta>\xi _d>\xi _r$; (2) the influence of hyperparameters $\eta ,\xi _d,\xi _r$ on performance is less than 0.7%; and (3) when hyperparameter $t>110$, its influence on performance tends to be stable.

Table 12 The values of four different hyperparameters in the WSIM

Full size table

4.9 Case study

In this subsection, we provide a case study analysis to show the effectiveness of the WSIM with respect to the experimental evaluation and illustrate the practicality of the method.

To illustrate the effectiveness of the WSIM, we show the matching results of two manuscripts ($p_1$,$p_2$) in the first dataset. To be reasonable, we chose two test samples with single-sample precision approximating the evaluation results (50.25%). The precision (topN=20) of these two manuscripts is 50% and 55%, respectively. Among the reviewers recommended for the manuscript, we focus on five reviewers corresponding to matching errors to show that the WSIM is more effective than the results of the evaluation metric. We use the title and the related fields (classification) of the paper to display the manuscript and reviewer’s information. The reviewer’s title and related fields are obtained from the most similar reviewer papers with respect to the target manuscript.

Table 13 shows the matching results for the two manuscripts, where the fields use symbolic representations, and Table 14 explains the names of the fields corresponding to the symbols. Among the five reviewers matched to manuscript $p_1$, reviewers $\{r_{11},r_{12},r_{13},r_{14}\}$ are truly suitable. Among the five reviewers matched to manuscript $p_2$, reviewers $\{r_{21},r_{22}\}$ are truly suitable. The reviewers corresponding to matching errors still have many of the appropriate qualifications to review the target manuscript. This is because the groundtruth of the evaluation metrics is strict and using only the label disqualifies some suitable reviewers.

This analysis shows that the WSIM is more effective and practical than the results of evaluation metrics.

Table 13 The matching results for the two manuscripts

Full size table

Table 14 Notions of fields

Full size table

5 Conclusions

We proposed an approach named the word and semantic-based iterative model (WSIM) to solve the retrieval-based reviewer assignment problem (RRAP). The WSIM determines the most appropriate reviewers for a target manuscript using a combination of word information and semantic information and considering the constraints of the RAP by improving the similarity calculations between reviewers and manuscripts. We reduce overfitting to incomplete reviewer data and the interference in the assignment from nonmanuscript-related papers in the reviewer data with a ranking-based approach and iterative model. We compare our approach with seven existing methods in closed-world settings, and the experimental results validate the effectiveness of our method.

The RAP includes the retrieval-based RAP, which we address in this paper, and the assignment-based RAP, which requires different strategies for different requirements (O’Dell et al. 2005) and is an interesting problem for future research. In the future, we also plan to provide an efficient system based on our proposed method for use by journals and conferences. In addition, we plan to explore how our methods can be applied to other research topics, such as information retrieval and question-answerers.

Notes

References

Anaya, Antonio R., Luque, Manuel, Letón, Emilio, & Hernández-del-Olmo, Félix. (2019). Automatic assignment of reviewers in an online peer assessment task based on social interactions. Expert Systems with Applications, 36(4), e12405. (ISSN 0266-4720.).
Basu, Chumki, Hirsh, Haym, Cohen, William W., & Nevill-Manning, Craig. (2001). Technical paper recommendation: A study in combining multiple information sources. Journal of Artificial Intelligence Research, 14, 231–252.
Article Google Scholar
Bergstra, James, & Bengio, Yoshua. (2012). Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 281–305.
MathSciNet MATH Google Scholar
Biswas, Humayun Kabir & Hasan, Md Maruf. (2007). Using publications and domain knowledge to build research profiles: An application in automatic reviewer assignment. In Information and Communication Technology, 2007. ICICT’07. International Conference on, pages 82–86. IEEE
Buckley, Chris, & Voorhees, Ellen M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual International ACM SIGIR Conference on research and development in information retrieval, pages 25–32. ACM
Büttcher, Stefan, Clarke, Charles LA, & Gordon V Cormack. (2016). Information retrieval: Implementing and evaluating search engines. Mit Press.
Charlin, Laurent, & Zemel, Richard. (2013). The toronto paper matching system: An automated paper-reviewer assignment system (p. 28). JMLR: W&CP.
Google Scholar
Charlin, Laurent, Zemel, Richard S., & Boutilier, Craig. (2012). A framework for optimizing paper matching. arXiv preprintarXiv:1202.3706
Di Mauro, Nicola, Basile, Teresa MA, Ferilli, Stefano. (2005). Grape: An expert review assignment component for scientific conference management systems. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 789–798. Springer
Dumais, Susan T., & Nielsen, Jakob. (1992). Automating the assignment of submitted manuscripts to reviewers. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and development in information retrieval, pages 233–244. ACM
Dung, Nguyen Dinh, Cong, Nguyen Huu, & Anh, Nguyen Tuan. (2017). Algorithm of dynamic programming for paper-reviewer assignment problem. IRJET, 04(11)
Ferilli, Stefano, Di Mauro, Nicola, Maria Altomare Basile, Teresa, Esposito, Floriana, & Biba, Marenglen. (2006). Automatic topics identification for reviewer assignment. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 721–730. Springer
Flach, Peter A., Spiegler, Sebastian, Golénia, Bruno, Price, Simon, Guiver, John, Herbrich, Ralf, et al. (2010). Novel tools to streamline the conference review process: Experiences from sigkdd’09. ACM SIGKDD Explorations Newsletter, 11(2), 63–67.
Article Google Scholar
Goldsmith, Judy, & Sloan, Robert H. (2007). The ai conference paper assignment problem (pp. 53–57). Vancouver. In Proc: AAAI Workshop on Preference Handling for Artificial Intelligence.
Hettich, Seth, & Pazzani, Michael J. (2006). Mining for proposal reviewers: lessons learned at the national science foundation. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge discovery and data mining, pages 862–871. ACM
Hoang, Dinh Tuye, Nguyen, Ngoc Thanh, & Hwang, Dosam. (2019). Decision support system for assignment of conference papers to reviewers. In International Conference on Computational Collective Intelligence, pages 441–450. Springer
Karimzadehgan, Maryam, & Zhai, ChengXiang. (2009). Constrained multi-aspect expertise matching for committee review assignment. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1697–1700. ACM
Karimzadehgan, Maryam, & Zhai, ChengXiang. (2012). Integer linear programming for constrained multi-aspect committee review assignment. Information Processing and Management, 48(4), 725–740.
Article Google Scholar
Karimzadehgan, Maryam, Zhai, ChengXiang, & Belford, Geneva. (2008). Multi-aspect expertise matching for review assignment. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 1113–1122. ACM
Kim, Jungil, & Lee, Eunjoo. (2018). Understanding review expertise of developers: A reviewer recommendation approach based on latent dirichlet allocation. Symmetry, 10(4), 114.
Article MathSciNet Google Scholar
Kou, Ngai Meng, Hou, U Leong, Mamoulis, Nikos, Gong, Zhiguo. (2015a). Weighted coverage based reviewer assignment. In Proceedings of the 2015 ACM SIGMOD International Conference on management of data, pages 2031–2046. ACM
Kou, Ngai Meng, Mamoulis, Nikos, Li, Yuhong, Zhiguo Gong, Ye Li (2015b) et al. A topic-based reviewer assignment system. Proceedings of the VLDB Endowment, 8(12): 1852–1855
Kusner, Matt, Sun, Yu, Kolkin, Nicholas, & Weinberger, Kilian. (2015). From word embeddings to document distances. In International Conference on Machine Learning, pages 957–966
Li, Baochun, & Hou, Y Thomas. (2016). The new automated iEEE infocom review assignment system. IEEE Network, 30(5), 18–24.
Article Google Scholar
Li, Xinlian, & Watanabe, Toyohide. (2013). Automatic paper-to-reviewer assignment, based on the matching degree of the reviewers. Procedia Computer Science, 22, 633–642.
Article Google Scholar
Lian, Jing Wu. (2018). Nicholas Mattei, Renee Noble, and Toby Walsh. The Conference paper assignment problem: Using order weighted averages to assign indivisible goods.
Liu, Ou., Wang, Jun, Ma, Jian, & Sun, Yonghong. (2016). An intelligent decision support approach for reviewer assignment in r&d project selection. Computers in Industry, 76, 1–10.
Article Google Scholar
Long, Cheng, Wong, Raymond Chi-Wing, Peng, Yu, & Ye, Liangliang. (2013). On good and fair paper-reviewer assignment. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1145–1150. IEEE
McGlinchey, Noel, Hunter, Tom, Bromley, Jack, Fisher, Ruth, Debiec-Waszak, Anna, & Gaston, Thomas. (2019). Do journal administrators solve the reviewer assignment problem as well as editors? consideration of reviewer rigour and timeliness. Learned Publishing, 32(1), 37–46. (ISSN 0953-1513.).
Mimno, David, McCallum, Andrew. (2007). Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and data mining, pages 500–509. ACM
Misale, Mohini, & Vanwari, Pankaj. (2017). A survey on recommendation system for technical paper reviewer assignment. In Electronics, Communication and Aerospace Technology (ICECA), 2017 International conference of, volume 2, pages 329–331. IEEE
O’Dell, Regina, Wattenhofer, Mirjam, & Wattenhofer, Roger. (2005). The paper assignment problem (p. 491). Department of Computer Science: Technical report/Swiss Federal Institute of Technology Zurich.
Google Scholar
Ogunleye, O., Ifebanjo, T., Abiodun, T., & Adebiyi, AA. (2017). Proposed framework for a paper-reviewer assignment system using word2vec
Peng, Hongwei, Hu, Haojie, Wang, Keqiang, & Wang, Xiaoling. (2017). Time-aware and topic-based reviewer assignment. In International Conference on Database Systems for Advanced Applications, pages 145–157. Springer
Price, Simon, & Flach, Peter A. (2017). Computational support for academic peer review: A perspective from artificial intelligence. Communications of the ACM, 60(3), 70–79.
Article Google Scholar
Protasiewicz, Jarosław. (2014). A support system for selection of reviewers. In Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on, pages 3062–3065. IEEE
Protasiewicz, Jaroslaw, Pedrycz, Witold, Kozlowski, Marek, Dadas, Slawomir, Stanislawek, Tomasz, Kopacz, Agata, & Galezewska, Malgorzata. (2016). A recommender system of reviewers and experts in reviewing problems. Knowledge-Based Systems, 106, 164–178.
Article Google Scholar
Rigaux, Philippe. (2004). An iterative rating method: application to web-based conference management. In Proceedings of the 2004 ACM symposium on Applied computing, pages 1682–1687. ACM
Shon, Ho Sun, Han, Sang Hun, Kim, Kyung Ah, Cha, Eun Jong, & Ryu, Keun Ho. (2017). Proposal reviewer recommendation system based on big data for a national research management institute. Journal of Information Science, 43(2), 147–158.
Article Google Scholar
Sidiropoulos, Nicholas D., & Tsakonas, Efthymios. (2015). Signal processing and optimization tools for conference review and session assignment. IEEE Signal Processing Magazine, 32(3), 141–155.
Article Google Scholar
Smucker, Mark D., Allan, James, & Carterette, Ben. (2007). A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 623–632
Fair and accurate reviewer assignment in peer review. (2019). Ivan Stelmakh, Nihar B Shah, and Aarti Singh. Peerreview4all. Algorithmic Learning Theory, 98, 827–855.
Google Scholar
Tang, Wenbin, Tang, Jie, Tan, Chenhao. (2010). Expertise matching via constraint-based optimization. In Web intelligence and intelligent agent technology (wi-iat), 2010 IEEE/wic/acm International Conference on, volume 1, pages 34–41. IEEE
Tang, Wenbin, Tang, Jie, Lei, Tao, Tan, Chenhao, Gao, Bo., & Li, Tian. (2012). On optimization of expertise matching with various constraints. Neurocomputing, 76(1), 71–83.
Article Google Scholar
Tang, Xijin, & Zhang, Zhengwen. (2008). Paper review assignment based on human-knowledge network. In Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on, pages 102–107. IEEE
Tayal, Devendra Kumar, Saxena, P. C., Sharma, Ankita, Khanna, Garima, & Gupta, Shubhangi. (2014). New method for solving reviewer assignment problem using type-2 fuzzy sets and fuzzy functions. Applied Intelligence, 40(1), 54–73.
Article Google Scholar
Wang, Fan, Chen, Ben, Miao, Zhaowei. (2008). A survey on reviewer assignment problem. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 718–727. Springer
Wang, Fan, Zhou, Shaorui, & Shi, Ning. (2013). Group-to-group reviewer assignment problem. Computers and Operations Research, 40(5), 1351–1362.
Article MathSciNet Google Scholar
Xu, Yichong, Zhao, Han, Shi, Xiaofei, & Shah, Nihar B. (2019). On strategyproof Conference peer review. pages 616–622
Yarowsky, David, & Florian, Radu. (1999). Taking the load off the conference chairs-towards a digital paper-routing assistant. In 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Yeşilçimen, Ali, & Yıldırım, E Alper. (2019). An alternative polynomial-sized formulation and an optimization based heuristic for the reviewer assignment problem. European Journal of Operational Research, 276(2), 436–450.
Article MathSciNet Google Scholar
Zhang, Dong, Zhao, Shu, Duan, Zhen, Chen, Jie, Zhang, Yanping, & Tang, Jie. (2020a). A multi-label classification method using a hierarchical and transparent representation for paper-reviewer recommendation. ACM Transactions on Information Systems, 38(1), 1–20. (ISSN 1046-8188.).
Zhang, Dong, Zhao, Shu, Duan, Zhen, Chen, Jie, Zhang, Yanping, & Tang, Jie. (2020b). A multi-label classification method using a hierarchical and transparent representation for paper-reviewer recommendation. ACM Transactions on Information Systems (TOIS), 38(1), 1–20.
Google Scholar
Zhao, Shu, Zhang, Dong, Duan, Zhen, Chen, Jie, Zhang, Yan-ping, & Tang, Jie. (2018). A novel classification method for paper-reviewer recommendation. Scientometrics, pages 1–21

Download references

Acknowledgements

This work was partially supported by National High Technology Research and Development Program (Grant # 2017YFB1401903), the National Natural Science Foundation of China (Grants # 61876001, # 61602003 and # 61673020), the Provincial Natural Science Foundation of Anhui Province (Grants # 1708085QF156), and the Recruitment Project of Anhui University for Academic and Technology Leader.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, Ministry of Education, Hefei, Anhui Province, 230601, China
Shicheng Tan, Zhen Duan, Shu Zhao, Jie Chen & Yanping Zhang
School of Computer Science and Technology, Anhui University, Hefei, 230601, China
Shicheng Tan, Zhen Duan, Shu Zhao, Jie Chen & Yanping Zhang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Hefei, Anhui Province, 230601, China
Shicheng Tan, Zhen Duan, Shu Zhao, Jie Chen & Yanping Zhang

Authors

Shicheng Tan
View author publications
You can also search for this author inPubMed Google Scholar
Zhen Duan
View author publications
You can also search for this author inPubMed Google Scholar
Shu Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Jie Chen
View author publications
You can also search for this author inPubMed Google Scholar
Yanping Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Shu Zhao or Yanping Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, S., Duan, Z., Zhao, S. et al. Improved reviewer assignment based on both word and semantic features. Inf Retrieval J 24, 175–204 (2021). https://doi.org/10.1007/s10791-021-09390-8

Download citation

Received: 26 May 2020
Accepted: 26 February 2021
Published: 02 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10791-021-09390-8

Keywords

Mathematics Subject Classification

68T99

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improved reviewer assignment based on both word and semantic features

Abstract

Similar content being viewed by others

A Recommender Engine for Scientific Paper Peer-Reviewing System

An efficient ontology-based topic-specific article recommendation model for best-fit reviewers

Expertise Computation for Automatic Reviewer Assignment

1 Introduction

1.1 Incompleteness of the reviewer data

1.2 The interference from nonmanuscript-related papers

2 Related work

2.1 Semantic-based approach

2.2 Word-based approach

2.3 Approach combining semantic and word information

3 Proposed model

3.1 Problem definition and notation

3.2 Feature extraction

3.2.1 Semantic features

3.2.2 Word features

3.3 Ranking-based approach and iterative model

3.3.1 Ranking-based approach

3.4 Iterative model

4 Experiments

4.1 Dataset

4.1.1 First dataset

4.1.2 Second dataset

4.1.3 Validation dataset

4.2 Comparison methods

4.2.1 Hyperparameters

4.3 Evaluation metrics

4.4 Experimental results

4.5 Ablation analysis

4.6 Significance test

4.7 Bias-variance decomposition

4.8 Hyperparameter analysis

4.9 Case study

5 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification