Keywords

1 Introduction

In community question answering (CQA) services, information seekers ask questions and other users share their knowledge by answering these questions. Due to their usefulness, these services have attracted a large number of users and have earned notable popularity. Some of trending CQA services are Yahoo! AnswersFootnote 1, AnswerbagFootnote 2, Wiki answerFootnote 3, Baidu ZhidaoFootnote 4, and StackoverflowFootnote 5.

In CQA services, question routing (QR) is the process of routing a new question to its potential answerers. Let, \(\hat{q}\) be a new question represented by a sequence of terms (non-stop words) \(t_j\)’s, \(j \in \{1, 2, \cdots , |\hat{q}|\}\), i.e., \({\hat{q}}=\{{t_1,t_2,\cdots ,t_{|\hat{q}|}}\}\). Let us also assume that A be a set of answerers. Then QR can be defined as the procedure to select a set of suitable answerers \({A{_{\hat{q}}}^*}\subseteq A\), who have the expertise on \(\hat{q}\). Generally, QR schemes follow three steps: (i) generation of answerers’ performance profiles, (ii) estimation of answerers’ expertise (based on their performance profiles) and (iii) routing of the question to the potential experts in the expert list. In this context, the performance profile of an answerer contains her answering history. Usually, it is considered to be the collection of all of the questions that she has answered previously. In our work, however, we maintain all questions with their posting time stamp and answering timestamp in the performance profile. In QR, expert finding (EF), is the process of identifying the possible experts corresponding to a given question. Let, \(Exp({{a_i},\hat{q}})\) be the expertise score of the answerer \(a_i \in A\) corresponding to \(\hat{q}\). Depending upon expertise score, a system ranks all of the answerers, and finally, selects the top N \((N=|{A{_{\hat{q}}}^*}|)\) ranked answerers, i.e., experts for routing \(\hat{q}\).

The performance of CQA services may be affected if the referred experts are not currently active in the community. In this case, the average waiting time for an asker to obtain the first suitable answer may be too high to be useful. Therefore, in QR, it is desirable to route a particular question to the experts, who are available online. We call such users as active experts.

Here, we propose an active expert finding (AEF) method. It incorporates an answerer’s activeness at the time of question posting and uses the query likelihood language (QLL) model to estimate the expertise of the answerer. We not only estimate an answerer’s availability during question posting time but also consider the answering intensity during the same period. To show the effectiveness of the proposed method, we have compared it with the baseline system on a real-world dataset downloaded from Yahoo! Answers web site using three performance measures. We find that in every corresponding scenario, the proposed AEF system outperformed the baseline EF system.

2 Related Works

The literature of EF is quite affluent [1, 2, 4,5,6, 8,9,10, 14, 18]. To keep this paper concise, however, we discuss some of the recent prominent works that deal with the availability of answerers [3, 7, 11, 15, 16].

In [7], the authors proposed a QR model that utilizes the language model to estimate a user’s expertise. This model [7] integrates the quality of each answerer’s previous answers along with her login availabilities. It [7] treats the subproblem of predicting a user’s login availability as a time series forecasting problem. In [3], the authors proposed a recommendation system that takes into account the compatibility, topical expertise, and availability of users. To estimate the availability of the users, they applied three classification techniques on previous activities of the users. In another work [16], researchers proposed a dynamic modelling approach for QR. They used two temporal discounting functions to model the availability of the users. While modelling the EF problem, the authors of [11] considered the dynamic aspects of the problem and proposed a supervised learning framework. The authors of [15] proposed a QR technique to route questions to users who are the most suitable to answer them based on their past and recent activities. They [15] proposed a measure which divides users’ activities into four parts depending upon the time to assess the activities of the users. They also assigned weights to each of the categories such that the recent categories attain comparatively higher importance.

3 Proposed Method

The present investigation is concerned with the formulation of a method for finding active experts for a new question in order to improve the effectiveness of question routing schemes. It consists of four phases: (i) estimation of expertise, (ii) estimation of activeness, (iii) estimation of answering intensity, and (iv) estimation of the active experts. We now discuss these phases in detail.

3.1 Estimation of Expertise

In information retrieval, QLL [12] model is used to estimate the similarity between a document and a given query. Treating answerers’ profiles as documents, we apply it to estimate the expertise of each answerer for a given question.

Let us assume that \({\theta _{a_i}}\) denotes the language model associated with the performance profile of the answerer \(a_i\), and \(\theta _C\) denotes the language model associated with the entire collection of performance profiles. Then, the expertise score of \(a_i\) for a given question \(\hat{q}\) is computed with the help of [12] as

$$\begin{aligned} \displaystyle Exp(a_i,\hat{q})= P(\hat{q}|{\theta _{a_i}}) ={ {{\prod _{{t}\in {\hat{q}}}} P(t|{\theta _{a_i}})}^{n(t,\hat{q})}}, \end{aligned}$$
(1)

where \(n(t,\hat{q})\) is the number of occurrence of t in \(\hat{q}\). To avoid the zero probabilities, we use the Jelinek-Mercer’s smoothing method [17] in (1) and obtain the following:

$$\begin{aligned}&\displaystyle P(\hat{q}|{\theta _{a_i}}) = {{\prod _{{t}\in {\hat{q}}}} \{ { \lambda p(t|{\theta _{a_i}}) + (1 - \lambda ) p(t|{\theta _C})}\} }^{n(t,\hat{q})} \end{aligned}$$
(2)
$$\begin{aligned}&\text {where} \nonumber \\&\displaystyle p(t|{\theta _{a_i}}) = \dfrac{tf(t,{\theta _{a_i}})}{\sum _{{t^{\prime }}\in {\theta _{a_i}}} tf({t^{\prime }},{\theta _{a_i}})}; \end{aligned}$$
(3)
$$\begin{aligned}&\text {and} \nonumber \\&\displaystyle p(t|{\theta _C}) = \dfrac{tf(t,{\theta _C})}{{\sum _{{t^{\prime }}\in {\theta _C}}} tf({t^{\prime }},{\theta _C})}. \end{aligned}$$
(4)

Here, \(tf(t,{\theta })\) is the frequency of the term t in \({\theta }\) (\(\theta _{a_i} \text { or } \theta _C\)), and \({{\sum _{{t^{\prime }}\in {\theta }}} tf({t^{\prime }},{\theta })}\) is the total number of occurrences of all terms in \({\theta }\). Moreover, \(\lambda \) \((0<\lambda <1)\) controls the influences of \(\theta _{a_i}\) and \(\theta _C\).

3.2 Estimation of Activeness

To incorporate the activeness of an answerer in the proposed method, we define the activeness score of an answerer \({a_i}\) as

$$\begin{aligned} \displaystyle \mathcal {AS}(a_i)=exp\left( -\left( {\mathtt {time}_{c}}-{\mathtt {time}_{a_i}}\right) /{{(24 \times 7)}}\right) , \end{aligned}$$
(5)

where \({\mathtt {time}_{c}}\) is the current system time and \({\mathtt {time}_{a_i}}\) is the last answering time of \(a_i\). Here, \(({\mathtt {time}_{c}}-{\mathtt {time}_{a_i}})\) is measured in hours. Note that, \(\mathcal {AS}(\cdot )\) is an exponentially decreasing function which ensures that an answerer who has answered during recently obtains a high activeness score.

3.3 Estimation of Answering Intensity

A suitable expert should not only be active (as we argued in the previous section), but also her participation in answering questions should be high. Accordingly, we introduce a new measure, called answering intensity that incorporates hourly answering activity and consistency of each answerer during each hour of a day.

In this regard, we construct answerer hourly answering activity matrix denoted by \(\mathcal {HA}\), of size \(|A|\times |H|\). Here, \(\mathcal {HA}_{a_i, h}\) represents the total number of answerers submitted by the answerer \(a_i\) (\(a_i \in A\)) in the \(h^{th}\) (\(h \in H=\{0, 1, \cdots , 23\}\)) hour. Then, similar to [3], \(a_i\)’s answering activity during the \(h^{th}\) hour is estimated as

$$\begin{aligned} \displaystyle Ans_{act}({a_i},h)=\dfrac{ {\mathcal {HA}_{a_i, h}}}{ {\sum _{\hat{h}\in H}\mathcal {HA}_{a_i, \hat{h}}}}. \end{aligned}$$
(6)

Next, to take into account the consistency of an answerer we construct an hourly consistency matrix denotes as \(\mathcal {HC}\), of size \(|A| \times |H|\). \(\mathcal {HC}_{a_i, h}\) indicates the number of days the answerer \(a_i\) answered during the \(h^{th}\) hour of the days (under consideration). Now, we estimate \(a_i\)’s answering consistency during the \(h^{th}\) hour as

$$\begin{aligned} \displaystyle \mathcal {H}_{con}({a_i},h)= \dfrac{{\mathcal {HC}_{a_i, h}}}{ {\sum _{\hat{h}\in H}\mathcal {HC}_{a_i, \hat{h}}}}. \end{aligned}$$
(7)

Then, we calculate the answering intensity of \({a_i}\) during the \(h^{th}\) hour as

$$\begin{aligned} \displaystyle \mathcal {I}_{ans}({a_i},h)= {Ans_{act}({a_i},h)} \times {\mathcal {H}_{con}({a_i},h)}. \end{aligned}$$
(8)

Let, \(h(\hat{q})\) be the hour when \(\hat{q}\) is post and \(\delta \) is the time window. We use \(\delta = 2\) h, which has been chosen based on our ad-hoc experiments. Now, we calculate the answering intensity of an answerer \(a_i\) for a question \(\hat{q}\) at the time of posting of the question i.e., for the time period \([h(\hat{q}) - \delta , h(\hat{q}) + \delta ]\) as

$$\begin{aligned} \displaystyle \mathcal {I}_{ans}(u_i,h(\hat{q})) = \prod _{h=h(\hat{q})- \delta }^{h(\hat{q})+\delta } {\mathcal {I}_{ans}(u_i, h)}. \end{aligned}$$
(9)

Equation (9) attains the effect on the answering intensity during the \(h^{th}\) hour.

3.4 Estimation of Active Experts

When a new question \(\hat{q}\) is posted at hour \(h({\hat{q}})\) of a day, the active expert score, \(Exp_{act}({a_i},{\hat{q}},h({\hat{q}}))\), of answerer \({a_i}\) for \({\hat{q}}\) is estimated as

$$\begin{aligned} \displaystyle Exp_{act}({a_i},{\hat{q}},h({\hat{q}}))= Exp({a_i},{\hat{q}}) \times \mathcal {AS}(a_i) \times \mathcal {I}_{ans}({a_i},h(\hat{q})). \end{aligned}$$
(10)

The answerers are finally ranked using active expert score.

4 Experiments and Results

We examine the performance of the proposed method on a real-world dataset, called history, which has been downloaded from Yahoo! Answers web site by us [6]. As the name of the dataset suggests, the questions in this dataset belong to the category history. It consists of 78,242 resolved questions and their answers posted from June 27, 2012 to September 06, 2013. We have marked 72,784 questions posted during June 27, 2012 to July 30, 2013, as the training set. Remaining 5,458 questions posted during July 31, 2013 to September 06, 2013 are marked as the test set. A summary about this dataset is provided in Table 1.

Table 1. Summary of the history dataset

Initially, we remove stop words from the dataset. Then, we stem the questions with Porter stemmer [13]. This process removes the common or morphological, and inflection endings. Moreover, from the training dataset, we remove the answerers who answered less than ten questions. After this process, we obtain 1,901 answerers’ profiles in the training dataset. To prepare the test dataset, furthermore, we remove the questions for which the best answerers do not exist in the filtered training set. Thus, we obtain 4,261 questions in the test data set.

To investigate the effect of activeness, we use two models: EF and AEF. In EF and AEF, we route the questions to the experts and active experts, respectively. Furthermore, to investigate the impact of the incorporation of answer quality in our method, we use two different configurations of the answerers’ profiles. In the first configuration, we consider all questions that a user has answered as the profile of the user. In the second configuration, to constitute the profile of a user, we consider only the questions for which the user has given the best answer. We denote these two configurations as ALL and BEST, respectively. As discussed earlier, we have two models, namely, EF and AEF and two configurations of user profiles, namely, ALL and BEST. Thus, we have four cases: (i) EF-ALL, (ii) EF-BEST, (iii) AEF-ALL, and (iv) AEF-BEST.

Table 2. MMR of different cases with different values of \(\lambda \)
Table 3. C@N for different cases with \(\lambda =0.5\)
Table 4. S@N of different cases with \(\lambda = 0.5\)

We use three performance measures here: mean reciprocal rank (MMR) [7], best answer coverage (C@N) [4], and success at the top N (S@N) [14]. When we use MRR, we examine with \(\lambda \in \{ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9\}\) in Eq. (2) and provide the results in Table 2. However, when we use C@N and S@N for the comparison, we use \(\lambda =0.5\) in Eq. (2). In Tables 3 and 4, we provide the results using C@N with \(N \in \{1, 10, 50, 200\}\) and S@N with \(N \in \{ 1, 2, 3, 4, 5 \}\), respectively.

From Tables 2, 3, and 4, we find that in two comparing scenarios (C@50 and C@200) AEF-ALL obtained the best performance, and in every other 16 comparing scenarios AEF-BEST obtained the best performance. We also observe that for any comparing scenario, the proposed AEF outperforms the baseline EF.

5 Conclusion

Here, we propose an active expert finding method for QR in CQA services. We use the QLL model to measure the expertise of an answerer for a given question. We propose two measures, called activeness score and answering intensity score. These two scores assess the activeness of an answerer and the intensity to which an answerer is consistent in answering questions during a particular hour of the day. Finally, aggregate the expertise score provided by the QLL model, the activeness score, and the answering intensity score to we define an active expert score.

We investigate the performance of the proposed method using a real-world dataset called History which has been downloaded from the Yahoo! Answers web site. We use three performance measures: MMR, C@N, and S@N. The proposed scheme is found to perform the best for every comparing scenario. However, we have made the choice of \(\delta \), in a crude way. A further investigation of the parameter \(\delta \) and experiments on more datasets are required.