Finding Active Experts for Question Routing in Community Question Answering Services

Kundu, Dipankar; Pal, Rajat Kumar; Mandal, Deba Prasad

doi:10.1007/978-3-030-34872-4_36

Dipankar Kundu¹⁴,
Rajat Kumar Pal¹⁵ &
Deba Prasad Mandal¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11942))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1268 Accesses
3 Citations

Abstract

In this article, we propose a method for finding active experts for a new question in order to improve the effectiveness of a question routing process. By active expert for a given question, we mean those experts who are active during the time of its posting. The proposed method uses the query likelihood language model, and two new measures, activeness and answering intensity. We compare the performance of the proposed method with its baseline query likelihood language model. We use a real-world dataset, called History, downloaded from Yahoo! Answers web portal for this purpose. In every comparing scenario, the proposed method is found to outperform the corresponding baseline model.

You have full access to this open access chapter, Download conference paper PDF

Finding Active Expert Users for Question Routing in Community Question Answering Sites

Structures or Texts? A Dynamic Gating Method for Expert Finding in CQA Services

Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting

Keywords

1 Introduction

In community question answering (CQA) services, information seekers ask questions and other users share their knowledge by answering these questions. Due to their usefulness, these services have attracted a large number of users and have earned notable popularity. Some of trending CQA services are Yahoo! Answers^{Footnote 1}, Answerbag^{Footnote 2}, Wiki answer^{Footnote 3}, Baidu Zhidao^{Footnote 4}, and Stackoverflow^{Footnote 5}.

In CQA services, question routing (QR) is the process of routing a new question to its potential answerers. Let, $\hat{q}$ be a new question represented by a sequence of terms (non-stop words) $t_j$’s, $j \in \{1, 2, \cdots , |\hat{q}|\}$, i.e., ${\hat{q}}=\{{t_1,t_2,\cdots ,t_{|\hat{q}|}}\}$. Let us also assume that A be a set of answerers. Then QR can be defined as the procedure to select a set of suitable answerers ${A{_{\hat{q}}}^*}\subseteq A$, who have the expertise on $\hat{q}$. Generally, QR schemes follow three steps: (i) generation of answerers’ performance profiles, (ii) estimation of answerers’ expertise (based on their performance profiles) and (iii) routing of the question to the potential experts in the expert list. In this context, the performance profile of an answerer contains her answering history. Usually, it is considered to be the collection of all of the questions that she has answered previously. In our work, however, we maintain all questions with their posting time stamp and answering timestamp in the performance profile. In QR, expert finding (EF), is the process of identifying the possible experts corresponding to a given question. Let, $Exp({{a_i},\hat{q}})$ be the expertise score of the answerer $a_i \in A$ corresponding to $\hat{q}$. Depending upon expertise score, a system ranks all of the answerers, and finally, selects the top N $(N=|{A{_{\hat{q}}}^*}|)$ ranked answerers, i.e., experts for routing $\hat{q}$.

The performance of CQA services may be affected if the referred experts are not currently active in the community. In this case, the average waiting time for an asker to obtain the first suitable answer may be too high to be useful. Therefore, in QR, it is desirable to route a particular question to the experts, who are available online. We call such users as active experts.

Here, we propose an active expert finding (AEF) method. It incorporates an answerer’s activeness at the time of question posting and uses the query likelihood language (QLL) model to estimate the expertise of the answerer. We not only estimate an answerer’s availability during question posting time but also consider the answering intensity during the same period. To show the effectiveness of the proposed method, we have compared it with the baseline system on a real-world dataset downloaded from Yahoo! Answers web site using three performance measures. We find that in every corresponding scenario, the proposed AEF system outperformed the baseline EF system.

2 Related Works

The literature of EF is quite affluent [1, 2, 4,5,6, 8,9,10, 14, 18]. To keep this paper concise, however, we discuss some of the recent prominent works that deal with the availability of answerers [3, 7, 11, 15, 16].

In [7], the authors proposed a QR model that utilizes the language model to estimate a user’s expertise. This model [7] integrates the quality of each answerer’s previous answers along with her login availabilities. It [7] treats the subproblem of predicting a user’s login availability as a time series forecasting problem. In [3], the authors proposed a recommendation system that takes into account the compatibility, topical expertise, and availability of users. To estimate the availability of the users, they applied three classification techniques on previous activities of the users. In another work [16], researchers proposed a dynamic modelling approach for QR. They used two temporal discounting functions to model the availability of the users. While modelling the EF problem, the authors of [11] considered the dynamic aspects of the problem and proposed a supervised learning framework. The authors of [15] proposed a QR technique to route questions to users who are the most suitable to answer them based on their past and recent activities. They [15] proposed a measure which divides users’ activities into four parts depending upon the time to assess the activities of the users. They also assigned weights to each of the categories such that the recent categories attain comparatively higher importance.

3 Proposed Method

The present investigation is concerned with the formulation of a method for finding active experts for a new question in order to improve the effectiveness of question routing schemes. It consists of four phases: (i) estimation of expertise, (ii) estimation of activeness, (iii) estimation of answering intensity, and (iv) estimation of the active experts. We now discuss these phases in detail.

3.1 Estimation of Expertise

In information retrieval, QLL [12] model is used to estimate the similarity between a document and a given query. Treating answerers’ profiles as documents, we apply it to estimate the expertise of each answerer for a given question.

Let us assume that ${\theta _{a_i}}$ denotes the language model associated with the performance profile of the answerer $a_i$, and $\theta _C$ denotes the language model associated with the entire collection of performance profiles. Then, the expertise score of $a_i$ for a given question $\hat{q}$ is computed with the help of [12] as

$$\begin{aligned} \displaystyle Exp(a_i,\hat{q})= P(\hat{q}|{\theta _{a_i}}) ={ {{\prod _{{t}\in {\hat{q}}}} P(t|{\theta _{a_i}})}^{n(t,\hat{q})}}, \end{aligned}$$

(1)

where $n(t,\hat{q})$ is the number of occurrence of t in $\hat{q}$. To avoid the zero probabilities, we use the Jelinek-Mercer’s smoothing method [17] in (1) and obtain the following:

$$\begin{aligned}&\displaystyle P(\hat{q}|{\theta _{a_i}}) = {{\prod _{{t}\in {\hat{q}}}} \{ { \lambda p(t|{\theta _{a_i}}) + (1 - \lambda ) p(t|{\theta _C})}\} }^{n(t,\hat{q})} \end{aligned}$$

(2)

$$\begin{aligned}&\text {where} \nonumber \\&\displaystyle p(t|{\theta _{a_i}}) = \dfrac{tf(t,{\theta _{a_i}})}{\sum _{{t^{\prime }}\in {\theta _{a_i}}} tf({t^{\prime }},{\theta _{a_i}})}; \end{aligned}$$

(3)

$$\begin{aligned}&\text {and} \nonumber \\&\displaystyle p(t|{\theta _C}) = \dfrac{tf(t,{\theta _C})}{{\sum _{{t^{\prime }}\in {\theta _C}}} tf({t^{\prime }},{\theta _C})}. \end{aligned}$$

(4)

Here, $tf(t,{\theta })$ is the frequency of the term t in ${\theta }$ ($\theta _{a_i} \text { or } \theta _C$), and ${{\sum _{{t^{\prime }}\in {\theta }}} tf({t^{\prime }},{\theta })}$ is the total number of occurrences of all terms in ${\theta }$. Moreover, $\lambda $ $(0<\lambda <1)$ controls the influences of $\theta _{a_i}$ and $\theta _C$.

3.2 Estimation of Activeness

To incorporate the activeness of an answerer in the proposed method, we define the activeness score of an answerer ${a_i}$ as

$$\begin{aligned} \displaystyle \mathcal {AS}(a_i)=exp\left( -\left( {\mathtt {time}_{c}}-{\mathtt {time}_{a_i}}\right) /{{(24 \times 7)}}\right) , \end{aligned}$$

(5)

where ${\mathtt {time}_{c}}$ is the current system time and ${\mathtt {time}_{a_i}}$ is the last answering time of $a_i$. Here, $({\mathtt {time}_{c}}-{\mathtt {time}_{a_i}})$ is measured in hours. Note that, $\mathcal {AS}(\cdot )$ is an exponentially decreasing function which ensures that an answerer who has answered during recently obtains a high activeness score.

3.3 Estimation of Answering Intensity

A suitable expert should not only be active (as we argued in the previous section), but also her participation in answering questions should be high. Accordingly, we introduce a new measure, called answering intensity that incorporates hourly answering activity and consistency of each answerer during each hour of a day.

In this regard, we construct answerer hourly answering activity matrix denoted by $\mathcal {HA}$, of size $|A|\times |H|$. Here, $\mathcal {HA}_{a_i, h}$ represents the total number of answerers submitted by the answerer $a_i$ ($a_i \in A$) in the $h^{th}$ ($h \in H=\{0, 1, \cdots , 23\}$) hour. Then, similar to [3], $a_i$’s answering activity during the $h^{th}$ hour is estimated as

$$\begin{aligned} \displaystyle Ans_{act}({a_i},h)=\dfrac{ {\mathcal {HA}_{a_i, h}}}{ {\sum _{\hat{h}\in H}\mathcal {HA}_{a_i, \hat{h}}}}. \end{aligned}$$

(6)

Next, to take into account the consistency of an answerer we construct an hourly consistency matrix denotes as $\mathcal {HC}$, of size $|A| \times |H|$. $\mathcal {HC}_{a_i, h}$ indicates the number of days the answerer $a_i$ answered during the $h^{th}$ hour of the days (under consideration). Now, we estimate $a_i$’s answering consistency during the $h^{th}$ hour as

$$\begin{aligned} \displaystyle \mathcal {H}_{con}({a_i},h)= \dfrac{{\mathcal {HC}_{a_i, h}}}{ {\sum _{\hat{h}\in H}\mathcal {HC}_{a_i, \hat{h}}}}. \end{aligned}$$

(7)

Then, we calculate the answering intensity of ${a_i}$ during the $h^{th}$ hour as

$$\begin{aligned} \displaystyle \mathcal {I}_{ans}({a_i},h)= {Ans_{act}({a_i},h)} \times {\mathcal {H}_{con}({a_i},h)}. \end{aligned}$$

(8)

Let, $h(\hat{q})$ be the hour when $\hat{q}$ is post and $\delta $ is the time window. We use $\delta = 2$ h, which has been chosen based on our ad-hoc experiments. Now, we calculate the answering intensity of an answerer $a_i$ for a question $\hat{q}$ at the time of posting of the question i.e., for the time period $[h(\hat{q}) - \delta , h(\hat{q}) + \delta ]$ as

$$\begin{aligned} \displaystyle \mathcal {I}_{ans}(u_i,h(\hat{q})) = \prod _{h=h(\hat{q})- \delta }^{h(\hat{q})+\delta } {\mathcal {I}_{ans}(u_i, h)}. \end{aligned}$$

(9)

Equation (9) attains the effect on the answering intensity during the $h^{th}$ hour.

3.4 Estimation of Active Experts

When a new question $\hat{q}$ is posted at hour $h({\hat{q}})$ of a day, the active expert score, $Exp_{act}({a_i},{\hat{q}},h({\hat{q}}))$, of answerer ${a_i}$ for ${\hat{q}}$ is estimated as

$$\begin{aligned} \displaystyle Exp_{act}({a_i},{\hat{q}},h({\hat{q}}))= Exp({a_i},{\hat{q}}) \times \mathcal {AS}(a_i) \times \mathcal {I}_{ans}({a_i},h(\hat{q})). \end{aligned}$$

(10)

The answerers are finally ranked using active expert score.

4 Experiments and Results

We examine the performance of the proposed method on a real-world dataset, called history, which has been downloaded from Yahoo! Answers web site by us [6]. As the name of the dataset suggests, the questions in this dataset belong to the category history. It consists of 78,242 resolved questions and their answers posted from June 27, 2012 to September 06, 2013. We have marked 72,784 questions posted during June 27, 2012 to July 30, 2013, as the training set. Remaining 5,458 questions posted during July 31, 2013 to September 06, 2013 are marked as the test set. A summary about this dataset is provided in Table 1.

Table 1. Summary of the history dataset

Full size table

Initially, we remove stop words from the dataset. Then, we stem the questions with Porter stemmer [13]. This process removes the common or morphological, and inflection endings. Moreover, from the training dataset, we remove the answerers who answered less than ten questions. After this process, we obtain 1,901 answerers’ profiles in the training dataset. To prepare the test dataset, furthermore, we remove the questions for which the best answerers do not exist in the filtered training set. Thus, we obtain 4,261 questions in the test data set.

To investigate the effect of activeness, we use two models: EF and AEF. In EF and AEF, we route the questions to the experts and active experts, respectively. Furthermore, to investigate the impact of the incorporation of answer quality in our method, we use two different configurations of the answerers’ profiles. In the first configuration, we consider all questions that a user has answered as the profile of the user. In the second configuration, to constitute the profile of a user, we consider only the questions for which the user has given the best answer. We denote these two configurations as ALL and BEST, respectively. As discussed earlier, we have two models, namely, EF and AEF and two configurations of user profiles, namely, ALL and BEST. Thus, we have four cases: (i) EF-ALL, (ii) EF-BEST, (iii) AEF-ALL, and (iv) AEF-BEST.

Table 2. MMR of different cases with different values of $\lambda $

Full size table

Table 3. C@N for different cases with $\lambda =0.5$

Full size table

Table 4. S@N of different cases with $\lambda = 0.5$

Full size table

We use three performance measures here: mean reciprocal rank (MMR) [7], best answer coverage (C@N) [4], and success at the top N (S@N) [14]. When we use MRR, we examine with $\lambda \in \{ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9\}$ in Eq. (2) and provide the results in Table 2. However, when we use C@N and S@N for the comparison, we use $\lambda =0.5$ in Eq. (2). In Tables 3 and 4, we provide the results using C@N with $N \in \{1, 10, 50, 200\}$ and S@N with $N \in \{ 1, 2, 3, 4, 5 \}$, respectively.

From Tables 2, 3, and 4, we find that in two comparing scenarios (C@50 and C@200) AEF-ALL obtained the best performance, and in every other 16 comparing scenarios AEF-BEST obtained the best performance. We also observe that for any comparing scenario, the proposed AEF outperforms the baseline EF.

5 Conclusion

Here, we propose an active expert finding method for QR in CQA services. We use the QLL model to measure the expertise of an answerer for a given question. We propose two measures, called activeness score and answering intensity score. These two scores assess the activeness of an answerer and the intensity to which an answerer is consistent in answering questions during a particular hour of the day. Finally, aggregate the expertise score provided by the QLL model, the activeness score, and the answering intensity score to we define an active expert score.

We investigate the performance of the proposed method using a real-world dataset called History which has been downloaded from the Yahoo! Answers web site. We use three performance measures: MMR, C@N, and S@N. The proposed scheme is found to perform the best for every comparing scenario. However, we have made the choice of $\delta $, in a crude way. A further investigation of the parameter $\delta $ and experiments on more datasets are required.

Notes

References

Aslay, Ç., O’Hare, N., Aiello, L.M., Jaimes, A.: Competition-based networks for expert finding. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1033–1036. ACM (2013)
Google Scholar
Bouguessa, M., Dumoulin, B., Wang, S.: Identifying authoritative actors in question-answering forums: the case of Yahoo! answers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 866–874. ACM (2008)
Google Scholar
Chang, S., Pal, A.: Routing questions for collaborative answering in community question answering. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 494–501. ACM (2013)
Google Scholar
Fang, L., Huang, M., Zhu, X.: Question routing in community based QA: incorporating answer quality and answer content. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, MDS 2012, pp. 5:1–5:8. ACM (2012)
Google Scholar
Jiang, J., Lu, W.: IR-based expert finding using filtered collection. In: 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1–5, October 2008
Google Scholar
Kundu, D., Mandal, D.P.: Formulation of a hybrid expertise retrieval system in community question answering services. Appl. Intell. 49(2), 463–477 (2019)
Article Google Scholar
Li, B., King, I.: Routing questions to appropriate answerers in community question answering services. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1585–1588. ACM (2010)
Google Scholar
Liu, D.R., Chen, Y.H., Kao, W.C., Wang, H.W.: Integrating expert profile, reputation and link analysis for expert finding in question-answering websites. Inf. Process. Manag. 49(1), 312–329 (2013)
Article Google Scholar
Liu, J., Song, Y.I., Lin, C.Y.: Competition-based user expertise score estimation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 425–434. ACM (2011)
Google Scholar
Mandal, D.P., Kundu, D., Maiti, S.: Finding experts in community question answering services: a theme based query likelihood language approach. In: 2015 International Conference on Advances in Computer Engineering and Applications, pp. 423–427, March 2015
Google Scholar
Neshati, M., Fallahnejad, Z., Beigy, H.: On dynamicity of expert finding in community question answering. Inf. Process. Manag. 53(5), 1026–1042 (2017)
Article Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Riahi, F., Zolaktaf, Z., Shafiei, M., Milios, E.: Finding expert users in community question answering. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012 Companion, pp. 791–798. ACM, New York (2012)
Google Scholar
Roy, P.K., Singh, J.P., Nag, A.: Finding active expert users for question routing in community question answering sites. In: Perner, P. (ed.) MLDM 2018. LNCS (LNAI), vol. 10935, pp. 440–451. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96133-0_33
Chapter Google Scholar
Yeniterzi, R., Callan, J.: Moving from static to dynamic modeling of expertise for question routing in CQA sites. In: International AAAI Conference on Web and Social Media (2015)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Article Google Scholar
Zhang, J., Ackerman, M.S., Adamic, L.: Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th International Conference on World Wide Web, pp. 221–230. ACM (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Dipankar Kundu & Deba Prasad Mandal
Department of CSE, University of Calcutta, Kolkata, India
Rajat Kumar Pal

Authors

Dipankar Kundu
View author publications
You can also search for this author in PubMed Google Scholar
Rajat Kumar Pal
View author publications
You can also search for this author in PubMed Google Scholar
Deba Prasad Mandal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dipankar Kundu .

Editor information

Editors and Affiliations

Tezpur University, Tezpur, India
Bhabesh Deka
Indian Statistical Institute, Kolkata, India
Pradipta Maji
Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Tezpur University, Tezpur, India
Dhruba Kumar Bhattacharyya
Indian Institute of Technology Guwahati, Guwahati, India
Prabin Kumar Bora
Indian Statistical Institute, Kolkata, India
Sankar Kumar Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kundu, D., Pal, R.K., Mandal, D.P. (2019). Finding Active Experts for Question Routing in Community Question Answering Services. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11942. Springer, Cham. https://doi.org/10.1007/978-3-030-34872-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-34872-4_36
Published: 25 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34871-7
Online ISBN: 978-3-030-34872-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Finding Active Experts for Question Routing in Community Question Answering Services

Abstract

Similar content being viewed by others

Finding Active Expert Users for Question Routing in Community Question Answering Sites

Structures or Texts? A Dynamic Gating Method for Expert Finding in CQA Services

Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting

Keywords

1 Introduction

2 Related Works

3 Proposed Method

3.1 Estimation of Expertise

3.2 Estimation of Activeness

3.3 Estimation of Answering Intensity

3.4 Estimation of Active Experts

4 Experiments and Results

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Finding Active Experts for Question Routing in Community Question Answering Services

Abstract

Similar content being viewed by others

Finding Active Expert Users for Question Routing in Community Question Answering Sites

Structures or Texts? A Dynamic Gating Method for Expert Finding in CQA Services

Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting

Keywords

1 Introduction

2 Related Works

3 Proposed Method

3.1 Estimation of Expertise

3.2 Estimation of Activeness

3.3 Estimation of Answering Intensity

3.4 Estimation of Active Experts

4 Experiments and Results

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation