Conferences >2016 IEEE International Confe...

An efficient parallel topic-sensitive expert finding algorithm using spark

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Expert finding is an important technique to obtain the user authority ranking in community question answering (CQA) websites. ZhihuRank is a topic-sensitive expert findin...Show More

Metadata

Abstract:

Expert finding is an important technique to obtain the user authority ranking in community question answering (CQA) websites. ZhihuRank is a topic-sensitive expert finding algorithm, which is based on both LDA and PageRank. Currently, with the amount of participants and documents increasing rapidly in CQA websites, how to parallel expert finding algorithms for big data analysis has received significant attention. In this paper, we find that the Spark framework is more suitable for paralleling expert finding algorithms than the MapReduce framework, which is a memory-based parallel computing model to support complicated iterative algorithms. As an example, we parallel ZhihuRank using MLlib's LDA and GraphX's PageRank in Spark. Experiments have been conducted on large-scale real data from Zhihu¹ (the most popular CQA website in China). And the experimental results confirmed the effectiveness and scalability of our proposed approach.

Published in: 2016 IEEE International Conference on Big Data (Big Data)

Date of Conference: 05-08 December 2016

Date Added to IEEE Xplore: 06 February 2017

ISBN Information:

DOI: 10.1109/BigData.2016.7841019

Conference Location: Washington, DC, USA