Abstract
In this article, we propose a method for finding active experts for a new question in order to improve the effectiveness of a question routing process. By active expert for a given question, we mean those experts who are active during the time of its posting. The proposed method uses the query likelihood language model, and two new measures, activeness and answering intensity. We compare the performance of the proposed method with its baseline query likelihood language model. We use a real-world dataset, called History, downloaded from Yahoo! Answers web portal for this purpose. In every comparing scenario, the proposed method is found to outperform the corresponding baseline model.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In community question answering (CQA) services, information seekers ask questions and other users share their knowledge by answering these questions. Due to their usefulness, these services have attracted a large number of users and have earned notable popularity. Some of trending CQA services are Yahoo! AnswersFootnote 1, AnswerbagFootnote 2, Wiki answerFootnote 3, Baidu ZhidaoFootnote 4, and StackoverflowFootnote 5.
In CQA services, question routing (QR) is the process of routing a new question to its potential answerers. Let, \(\hat{q}\) be a new question represented by a sequence of terms (non-stop words) \(t_j\)’s, \(j \in \{1, 2, \cdots , |\hat{q}|\}\), i.e., \({\hat{q}}=\{{t_1,t_2,\cdots ,t_{|\hat{q}|}}\}\). Let us also assume that A be a set of answerers. Then QR can be defined as the procedure to select a set of suitable answerers \({A{_{\hat{q}}}^*}\subseteq A\), who have the expertise on \(\hat{q}\). Generally, QR schemes follow three steps: (i) generation of answerers’ performance profiles, (ii) estimation of answerers’ expertise (based on their performance profiles) and (iii) routing of the question to the potential experts in the expert list. In this context, the performance profile of an answerer contains her answering history. Usually, it is considered to be the collection of all of the questions that she has answered previously. In our work, however, we maintain all questions with their posting time stamp and answering timestamp in the performance profile. In QR, expert finding (EF), is the process of identifying the possible experts corresponding to a given question. Let, \(Exp({{a_i},\hat{q}})\) be the expertise score of the answerer \(a_i \in A\) corresponding to \(\hat{q}\). Depending upon expertise score, a system ranks all of the answerers, and finally, selects the top N \((N=|{A{_{\hat{q}}}^*}|)\) ranked answerers, i.e., experts for routing \(\hat{q}\).
The performance of CQA services may be affected if the referred experts are not currently active in the community. In this case, the average waiting time for an asker to obtain the first suitable answer may be too high to be useful. Therefore, in QR, it is desirable to route a particular question to the experts, who are available online. We call such users as active experts.
Here, we propose an active expert finding (AEF) method. It incorporates an answerer’s activeness at the time of question posting and uses the query likelihood language (QLL) model to estimate the expertise of the answerer. We not only estimate an answerer’s availability during question posting time but also consider the answering intensity during the same period. To show the effectiveness of the proposed method, we have compared it with the baseline system on a real-world dataset downloaded from Yahoo! Answers web site using three performance measures. We find that in every corresponding scenario, the proposed AEF system outperformed the baseline EF system.
2 Related Works
The literature of EF is quite affluent [1, 2, 4,5,6, 8,9,10, 14, 18]. To keep this paper concise, however, we discuss some of the recent prominent works that deal with the availability of answerers [3, 7, 11, 15, 16].
In [7], the authors proposed a QR model that utilizes the language model to estimate a user’s expertise. This model [7] integrates the quality of each answerer’s previous answers along with her login availabilities. It [7] treats the subproblem of predicting a user’s login availability as a time series forecasting problem. In [3], the authors proposed a recommendation system that takes into account the compatibility, topical expertise, and availability of users. To estimate the availability of the users, they applied three classification techniques on previous activities of the users. In another work [16], researchers proposed a dynamic modelling approach for QR. They used two temporal discounting functions to model the availability of the users. While modelling the EF problem, the authors of [11] considered the dynamic aspects of the problem and proposed a supervised learning framework. The authors of [15] proposed a QR technique to route questions to users who are the most suitable to answer them based on their past and recent activities. They [15] proposed a measure which divides users’ activities into four parts depending upon the time to assess the activities of the users. They also assigned weights to each of the categories such that the recent categories attain comparatively higher importance.
3 Proposed Method
The present investigation is concerned with the formulation of a method for finding active experts for a new question in order to improve the effectiveness of question routing schemes. It consists of four phases: (i) estimation of expertise, (ii) estimation of activeness, (iii) estimation of answering intensity, and (iv) estimation of the active experts. We now discuss these phases in detail.
3.1 Estimation of Expertise
In information retrieval, QLL [12] model is used to estimate the similarity between a document and a given query. Treating answerers’ profiles as documents, we apply it to estimate the expertise of each answerer for a given question.
Let us assume that \({\theta _{a_i}}\) denotes the language model associated with the performance profile of the answerer \(a_i\), and \(\theta _C\) denotes the language model associated with the entire collection of performance profiles. Then, the expertise score of \(a_i\) for a given question \(\hat{q}\) is computed with the help of [12] as
where \(n(t,\hat{q})\) is the number of occurrence of t in \(\hat{q}\). To avoid the zero probabilities, we use the Jelinek-Mercer’s smoothing method [17] in (1) and obtain the following:
Here, \(tf(t,{\theta })\) is the frequency of the term t in \({\theta }\) (\(\theta _{a_i} \text { or } \theta _C\)), and \({{\sum _{{t^{\prime }}\in {\theta }}} tf({t^{\prime }},{\theta })}\) is the total number of occurrences of all terms in \({\theta }\). Moreover, \(\lambda \) \((0<\lambda <1)\) controls the influences of \(\theta _{a_i}\) and \(\theta _C\).
3.2 Estimation of Activeness
To incorporate the activeness of an answerer in the proposed method, we define the activeness score of an answerer \({a_i}\) as
where \({\mathtt {time}_{c}}\) is the current system time and \({\mathtt {time}_{a_i}}\) is the last answering time of \(a_i\). Here, \(({\mathtt {time}_{c}}-{\mathtt {time}_{a_i}})\) is measured in hours. Note that, \(\mathcal {AS}(\cdot )\) is an exponentially decreasing function which ensures that an answerer who has answered during recently obtains a high activeness score.
3.3 Estimation of Answering Intensity
A suitable expert should not only be active (as we argued in the previous section), but also her participation in answering questions should be high. Accordingly, we introduce a new measure, called answering intensity that incorporates hourly answering activity and consistency of each answerer during each hour of a day.
In this regard, we construct answerer hourly answering activity matrix denoted by \(\mathcal {HA}\), of size \(|A|\times |H|\). Here, \(\mathcal {HA}_{a_i, h}\) represents the total number of answerers submitted by the answerer \(a_i\) (\(a_i \in A\)) in the \(h^{th}\) (\(h \in H=\{0, 1, \cdots , 23\}\)) hour. Then, similar to [3], \(a_i\)’s answering activity during the \(h^{th}\) hour is estimated as
Next, to take into account the consistency of an answerer we construct an hourly consistency matrix denotes as \(\mathcal {HC}\), of size \(|A| \times |H|\). \(\mathcal {HC}_{a_i, h}\) indicates the number of days the answerer \(a_i\) answered during the \(h^{th}\) hour of the days (under consideration). Now, we estimate \(a_i\)’s answering consistency during the \(h^{th}\) hour as
Then, we calculate the answering intensity of \({a_i}\) during the \(h^{th}\) hour as
Let, \(h(\hat{q})\) be the hour when \(\hat{q}\) is post and \(\delta \) is the time window. We use \(\delta = 2\)Â h, which has been chosen based on our ad-hoc experiments. Now, we calculate the answering intensity of an answerer \(a_i\) for a question \(\hat{q}\) at the time of posting of the question i.e., for the time period \([h(\hat{q}) - \delta , h(\hat{q}) + \delta ]\) as
Equation (9) attains the effect on the answering intensity during the \(h^{th}\) hour.
3.4 Estimation of Active Experts
When a new question \(\hat{q}\) is posted at hour \(h({\hat{q}})\) of a day, the active expert score, \(Exp_{act}({a_i},{\hat{q}},h({\hat{q}}))\), of answerer \({a_i}\) for \({\hat{q}}\) is estimated as
The answerers are finally ranked using active expert score.
4 Experiments and Results
We examine the performance of the proposed method on a real-world dataset, called history, which has been downloaded from Yahoo! Answers web site by us [6]. As the name of the dataset suggests, the questions in this dataset belong to the category history. It consists of 78,242 resolved questions and their answers posted from June 27, 2012 to September 06, 2013. We have marked 72,784 questions posted during June 27, 2012 to July 30, 2013, as the training set. Remaining 5,458 questions posted during July 31, 2013 to September 06, 2013 are marked as the test set. A summary about this dataset is provided in Table 1.
Initially, we remove stop words from the dataset. Then, we stem the questions with Porter stemmer [13]. This process removes the common or morphological, and inflection endings. Moreover, from the training dataset, we remove the answerers who answered less than ten questions. After this process, we obtain 1,901 answerers’ profiles in the training dataset. To prepare the test dataset, furthermore, we remove the questions for which the best answerers do not exist in the filtered training set. Thus, we obtain 4,261 questions in the test data set.
To investigate the effect of activeness, we use two models: EF and AEF. In EF and AEF, we route the questions to the experts and active experts, respectively. Furthermore, to investigate the impact of the incorporation of answer quality in our method, we use two different configurations of the answerers’ profiles. In the first configuration, we consider all questions that a user has answered as the profile of the user. In the second configuration, to constitute the profile of a user, we consider only the questions for which the user has given the best answer. We denote these two configurations as ALL and BEST, respectively. As discussed earlier, we have two models, namely, EF and AEF and two configurations of user profiles, namely, ALL and BEST. Thus, we have four cases: (i) EF-ALL, (ii) EF-BEST, (iii) AEF-ALL, and (iv) AEF-BEST.
We use three performance measures here: mean reciprocal rank (MMR) [7], best answer coverage (C@N) [4], and success at the top N (S@N) [14]. When we use MRR, we examine with \(\lambda \in \{ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9\}\) in Eq. (2) and provide the results in Table 2. However, when we use C@N and S@N for the comparison, we use \(\lambda =0.5\) in Eq. (2). In Tables 3 and 4, we provide the results using C@N with \(N \in \{1, 10, 50, 200\}\) and S@N with \(N \in \{ 1, 2, 3, 4, 5 \}\), respectively.
From Tables 2, 3, and 4, we find that in two comparing scenarios (C@50 and C@200) AEF-ALL obtained the best performance, and in every other 16 comparing scenarios AEF-BEST obtained the best performance. We also observe that for any comparing scenario, the proposed AEF outperforms the baseline EF.
5 Conclusion
Here, we propose an active expert finding method for QR in CQA services. We use the QLL model to measure the expertise of an answerer for a given question. We propose two measures, called activeness score and answering intensity score. These two scores assess the activeness of an answerer and the intensity to which an answerer is consistent in answering questions during a particular hour of the day. Finally, aggregate the expertise score provided by the QLL model, the activeness score, and the answering intensity score to we define an active expert score.
We investigate the performance of the proposed method using a real-world dataset called History which has been downloaded from the Yahoo! Answers web site. We use three performance measures: MMR, C@N, and S@N. The proposed scheme is found to perform the best for every comparing scenario. However, we have made the choice of \(\delta \), in a crude way. A further investigation of the parameter \(\delta \) and experiments on more datasets are required.
References
Aslay, Ç., O’Hare, N., Aiello, L.M., Jaimes, A.: Competition-based networks for expert finding. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1033–1036. ACM (2013)
Bouguessa, M., Dumoulin, B., Wang, S.: Identifying authoritative actors in question-answering forums: the case of Yahoo! answers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 866–874. ACM (2008)
Chang, S., Pal, A.: Routing questions for collaborative answering in community question answering. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 494–501. ACM (2013)
Fang, L., Huang, M., Zhu, X.: Question routing in community based QA: incorporating answer quality and answer content. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, MDS 2012, pp. 5:1–5:8. ACM (2012)
Jiang, J., Lu, W.: IR-based expert finding using filtered collection. In: 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1–5, October 2008
Kundu, D., Mandal, D.P.: Formulation of a hybrid expertise retrieval system in community question answering services. Appl. Intell. 49(2), 463–477 (2019)
Li, B., King, I.: Routing questions to appropriate answerers in community question answering services. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1585–1588. ACM (2010)
Liu, D.R., Chen, Y.H., Kao, W.C., Wang, H.W.: Integrating expert profile, reputation and link analysis for expert finding in question-answering websites. Inf. Process. Manag. 49(1), 312–329 (2013)
Liu, J., Song, Y.I., Lin, C.Y.: Competition-based user expertise score estimation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 425–434. ACM (2011)
Mandal, D.P., Kundu, D., Maiti, S.: Finding experts in community question answering services: a theme based query likelihood language approach. In: 2015 International Conference on Advances in Computer Engineering and Applications, pp. 423–427, March 2015
Neshati, M., Fallahnejad, Z., Beigy, H.: On dynamicity of expert finding in community question answering. Inf. Process. Manag. 53(5), 1026–1042 (2017)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Riahi, F., Zolaktaf, Z., Shafiei, M., Milios, E.: Finding expert users in community question answering. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012 Companion, pp. 791–798. ACM, New York (2012)
Roy, P.K., Singh, J.P., Nag, A.: Finding active expert users for question routing in community question answering sites. In: Perner, P. (ed.) MLDM 2018. LNCS (LNAI), vol. 10935, pp. 440–451. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96133-0_33
Yeniterzi, R., Callan, J.: Moving from static to dynamic modeling of expertise for question routing in CQA sites. In: International AAAI Conference on Web and Social Media (2015)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Zhang, J., Ackerman, M.S., Adamic, L.: Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th International Conference on World Wide Web, pp. 221–230. ACM (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kundu, D., Pal, R.K., Mandal, D.P. (2019). Finding Active Experts for Question Routing in Community Question Answering Services. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11942. Springer, Cham. https://doi.org/10.1007/978-3-030-34872-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-34872-4_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34871-7
Online ISBN: 978-3-030-34872-4
eBook Packages: Computer ScienceComputer Science (R0)