ABSTRACT
In this paper, we address the problem of expert finding in community question answering (CQA). Most of the existing approaches attempt to find experts in CQA by means of link analysis techniques. However, these traditional techniques only consider the link structure while ignore the topical similarity among users (askers and answerers) and user expertise and user reputation. In this study, we propose a topic-sensitive probabilistic model, which is an extension of PageRank algorithm to find experts in CQA. Compared to the traditional link analysis techniques, our proposed method is more effective because it finds the experts by taking into account both the link structure and the topical similarity among users. We conduct experiments on real world data set from Yahoo! Answers. Experimental results show that our proposed method significantly outperforms the traditional link analysis techniques and achieves the state-of-the-art performance for expert finding in CQA.
- E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. 2008. Finding high-quality content in social media. In WSDM. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022. Google ScholarDigital Library
- M. Bouguessa, B. Dumoulin, and S. Wang. 2008. Identifying authoritative actors in question-answering forums - the case of Yahoo! Answers. In KDD, pages 866--874. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38.Google ScholarCross Ref
- T. Griffiths and M. Steyvers. 2004. Finding scientific topics. The National Academy of Sciences, 101:5228--5235.Google ScholarCross Ref
- T. H. Haveliwala. 2002. Topic-sensitive pagerank. In WWW. Google ScholarDigital Library
- P. Jurczyk and E. Agichtein. 2007. Discovering authorities in question answer communities by using link analysis. In CIKM, pages 919--922. Google ScholarDigital Library
- W. Kao, D. Liu, and S. Wang. 2010. Expert finding in question-answering websites: a novel hybrid approach. In SAC, pages 867--871. Google ScholarDigital Library
- J. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632. Google ScholarDigital Library
- S. Kullback and R. A. Leibler. 1951. On information and sufficiency. Annals of Mathematical Statistics 22 (1): 79--86.Google ScholarCross Ref
- J. Lafferty and C. Zhai. 2003. Probabilistic relevance models based on document and query generation. Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval.Google Scholar
- B. Li and I. King. 2010. Routing questions to appropriate answerers in community question answering services. In CIKM, pages 1585--1588. Google ScholarDigital Library
- Z. Liu, W. Huang, Y. Zheng, and M. Sun. 2010. Automatic keyphrase extraction via topic decomposition. In EMNLP, pages 366--376. Google ScholarDigital Library
- J. Liu, Y. -I. Song, and C. -Y. Lin. 2011. Competition-based user expertise score estimation. In SIGIR, pages 425--434. Google ScholarDigital Library
- L. Nie, B. D. Davison, and X. Qi. 2006. Topic link analysis for web search. In SIGIR. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The pagerank citation ranking: bringing order to the web. Stanford Digtital Library Technologies Project.Google Scholar
- A. Pal and S. Counts. 2011. Identifying topical authorities in microblogs. In WSDM. Google ScholarDigital Library
- A. Pal and J. Konstan. 2010. Expert identification in community question answering: exploring question selection bias. In CIKM, pages 1505--1508. Google ScholarDigital Library
- I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. 2008. Fast collapsed gibbs sampling for latent dirichlet allocation. In KDD, pages 569--577. Google ScholarDigital Library
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. 2004. The author-topic model for authors and documents. In UAI, pages 487--494. Google ScholarDigital Library
- D. Schall and F. Skopik. 2011. An Analysis of the Structure and Dynamics of Large-Scale Q/A Communities. In ADBIS, pages 285--301. Google ScholarDigital Library
- M. Steyvers, P. Smyth, and T. Griffiths. 2002. Probabilistic author-topic models for informaiton discovery. In KDD. Google ScholarDigital Library
- J. Weng, E. -P. Lim, J. Jiang, and Q. He. 2010. TwitterRank: finding topic-sensitive influential twitterers. In WSDM. Google ScholarDigital Library
- J. Zhang, M. Ackerman, and L. Adamic. 2007. Expertise networks in online commmunities: structure and algorithm. In WWW. Google ScholarDigital Library
- G. Zhou, L. Cai, J. Zhao, and K. Liu. 2011. Phrase-based translation model for question retrieval in community question answer archives. In ACL, pages 653--662. Google ScholarDigital Library
Index Terms
- Topic-sensitive probabilistic model for expert finding in question answer communities
Recommendations
An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities
In this article, we study the problem of finding experts in community question answering (CQA). Most of the existing approaches attempt to find experts in CQA via link analysis. One primary challenge of expert finding lies in that how to improve ...
Expert Finding in CQA Based on Topic Professional Level Model
Data Mining and Big DataAbstractIn the CQA (Community Question Answering) systems, expert finding is one of the most important subjects. The task of expert finding is aimed at discovering users with relevant expertise or experience for a given question. However, with the ...
Finding Active Expert Users for Question Routing in Community Question Answering Sites
Machine Learning and Data Mining in Pattern RecognitionAbstractCommunity Question Answering (CQA) sites facilitate users to ask questions and get answered by fellow users interested in the topic of the question. A vast number of questions are posted on these sites every day. Some questions receive numbers of ...
Comments