AUnet: An Unsupervised Method for Answer Reliability Evaluation in Community QA Systems

Ren, Ruoqing; Duan, Haimeng; Liu, Wenqiang; Liu, Jun

doi:10.1007/978-3-030-01298-4_24

AUnet: An Unsupervised Method for Answer Reliability Evaluation in Community QA Systems

Ruoqing Ren^15,16,
Haimeng Duan^15,16,
Wenqiang Liu^15,16 &
…
Jun Liu^15,17

Conference paper
First Online: 21 October 2018

1018 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11268))

Abstract

Recently, cQA websites such as Baidu Zhidao and StackExchange have exploded in popularity since everyone can post questions for other users to answer which fully realize the value of exchange. Nevertheless, the answers from different users for a same question may include errors, irrelevant messages or malicious advertisements due to the great different backgrounds of users. Hence, the automatic method for answer reliability evaluation is very important for improving users’ experience. However, the weakness of existing supervised methods is the high cost for they need a lot of annotated data. To alleviate such problems, we proposed a novel unsupervised answer evaluation method exploiting Answer-User association Network in this paper. Based on the constructed network, the reliability of answers and users can be obtained simultaneously by an iterative process. The experimental results on real word datasets show that our proposed method outperforms existing approaches.

Download conference paper PDF

1 Introduction

Community question answering (cQA) websites, such as the general Baidu Zhidao^{Footnote 1} and Yahoo!Answers^{Footnote 2}, and the vertical StackExchange^{Footnote 3} and GuoKe^{Footnote 4}, are becoming more and more popular since everyone can ask, answer, edit, and organize questions on the website. Compared to the traditional techniques for information retrieval, cQA has made a headway in solving complex, advice-seeking, reasoning questions based on its user-generated-content.

The fast-growing crowdsourcing Q&A data has a good application and development prospect for understanding complex, implicit and self-organization answers. However, the data quality problem [1,2,3] still exists due to the great different backgrounds of answerers. The low-quality data makes a portion of data cannot be applied directly. Hence, automatic answer reliability evaluation method is very important for improving user experience and constructing high quality Q&A knowledge base.

However, the existing supervised approaches for user reliability evaluation need large amounts of annotated data which is time consuming and limits the applicability to new domains [4]. Besides, unsupervised methods mainly depend on the answerers reputation and result in low accuracy owing to less factors considered.

Therefore, high accuracy unsupervised methods are needed. In this paper, we proposed a novel unsupervised method for answer reliability evaluation by constructing Answer-User association Network (AUnet). This network can successfully captures a variety of factors that affect the reliability of the answer. The contributions of this paper are as follows:

We constructed AUnet to capture a variety of factors that affect the reliability of the answer. And then the answer reliability evaluation problem is formalized as computing the reliability of node variables on heterogeneous information network.
A mutual inference algorithm based on AUnet is proposed to calculate the answer reliability. The reliability of answers and users can be obtained simultaneously by an iterative process without any annotated data.
Experiments on four real datasets from StackExchange have been conducted to test the effectiveness of our method. The results show our method works well.

2 Related work

Our work relates to the answer reliability evaluation and the network-based trust propagation algorithm.

Researches about evaluating the answer reliability are mainly divided into supervised methods and unsupervised methods. Like Maximum Entropy used in [5], Logistic Regression used in [6] and Rand Forests used in [7], supervised methods mainly evaluate and predict the answer reliability by training the classifier based on the manually annotated features of the answer such as community features, user features, textual features and statistical features. Although supervised methods can achieve excellent results, the cost of labeling data is high. Rather than directly evaluate the answer reliability, unsupervised methods resort to calculates user’s authority through mining the relation between users, such as the improved PageRank in [8] and the improved HITS in [9]. Besides, Wu et al. [10] achieved the best results from the current unsupervised methods based on the idea of minimizing the difference among answers. For unsupervised methods, data annotation is not required but the accuracy is relatively low.

The network-based trust propagation algorithm is used to effectively identify the trustworthiness of nodes in the network. At present, the network-based trust propagation algorithm is mainly used for fraud detection, selection of comments with high quality, and the discovery of authoritative users and reliable users [11]. Such as Leman et al. [12] iteratively calculated the reliability of the user by using the trust propagation algorithm on bipartite graph, Li et al. [13] used typed Markov Random Fields to detect the campaign promoters on social media and Ko et al. [2] regarded the marginal probability of each answer inferred by the maximum joint probability distribution on the answer association network as the answer reliability. As far as we know, there is no method constructing the trust network to simultaneously model multiple factors that affect the answer reliability and calculating the answer reliability by the trust propagation algorithm on the network.

3 Approach Overview

3.1 Problem Definition and Data Observation

The problem of evaluating the user reliability is formalized as: Given a set of questions $Q=\{q_1,q_2,...,q_n\} $, a set of all answers $A=\displaystyle \bigcup _{i=1}^{n} A_i $, where $A_i=\{a_{i1},a_{i2},...,a_{i_{mi}}\} $ is a set of $m_i $ answers of the question $ q_i\in Q $ and a set of users $U=\{u_1,u_2,...,u_k\} $. Our goal is to model multiple factors which affect the answer reliability into a network and output the answer reliability $ \tau \left( a_{ij}\right) $ of each answer $ a_{ij} $.

Definition 1

(Answer Reliability). We let $ \tau \left( a_{ij}\right) $ denote the reliability of the answer $ a_{ij} $, which indicates the extent people trust something [14]. We take the answer reliability $ \tau \left( a_{ij}\right) \in \left[ 0,1\right] $ , and the answer will be more reliable if the answer reliability is more closer to 1.

Definition 2

(User Reliability). The user reliability $\omega (u_k) $ of a user $u_k $ indicates the probability of the user providing reliable answers, and $\omega (u_k) \in \left[ 0,1\right] $. The user will be more reliable if the user reliability is more closer to 1.

Through the observation of the data, we found that two direct factors and two indirect factors affect the answer reliability.

Direct Factors

The number of votes of the answer affect the answer reliability. Answers with more votes tend to be more reliable than those with fewer votes. In order to eliminate the different concerns between questions, the shares of votes is used instead of the number of votes to represent the supporting degree to the answer among all voters in participating for the same question. We let $ fvote_{ij}=\frac{vote_{ij}}{\displaystyle \sum _{j=1\rightarrow m_i} vote_{ij}} $ denote the share of votes for the answer $ a_{ij} $, where $ vote_{ij} $ is the number of votes for the answer $ a_{ij} $ and $ m_i $ is the number of all answers to question $ q_i $. Our statistics showed that the average share of votes of the best answer is apparently higher than non-best answers.
The frequency of core words affects the answer reliability. The answer with clearer expression and more information is more likely to be reliable. A sentence is considered consisting of meaningless stop words and informative core words. We use the frequency of core words to represent the amount of the information a sentence conveys: $ fcore_{ij}=\frac{\sum _{n=1}^{N_{ij}} I\left( w_n\right) }{N_{ij}} $, where $ N_{ij} $ is the number of words in the answer $ a_{ij} $ and $ I\left( w_n\right) $ is an indicator function, using 1 or 0 to indicate the word $ w_n $ is a core word or not. We found that the frequency of core words of the best answer is apparently higher than non-best answers.

Indirect Factors

Correlation among answers for the same question affects the answer reliability. The reliability of similar answers is positively correlated and mutually driven. Assume that an answer is reliable, it’s similar answers are more likely to be reliable, but the different answers of it are more likely to be unreliable.
Correlation among answers and corresponding users affects the answer reliability. The answer from a more reliable user is more likely to be reliable. Users who provide reliable answers are more likely to be reliable.

3.2 AUnet Model

Based on the four factors above, we constructed AUnet to model them in a unified framework with the reference to the concept of heterogeneous information networks. The network model is shown in Fig. 1, which is defined as:

$$ G=\{V,E,W,P\} $$

$ V=A \cup U $ is a set of all nodes in AUnet, $ A=\displaystyle \bigcup _{i=1}^{n} A_i $ is a set of answers to all questions denoted in the blue circle and $ U=\{u_1,u_2,...,u_k\} $ is a set of all users denoted in the black square.
$ E=E_p \cup E_s $ is a set of all edges in AUnet. The similarity relation between answers $ E_s \subseteq A \times A $ are denoted as red undirected edges, and the provided relation between users and answers $ E_p \subseteq A \times U $ are denoted as black undirected edges.
$ W=\{W_e|e \in E\} $ is a set of the corresponding weights of the edges. $ w_s=sim(a_{ij},a_{ij^{'}}) $ is the weight of the similarity relation between answers $ a_{ij} $ and $ a_{ij^\prime } $, and $ w_s\in \left[ 0,1\right] $. In this paper, we adopted sen2vec [15] and cosine similarity to calculate the semantic similarity $ w_s $ between any two answers to the same question. For the weight of the provided relation $ w_p $ between the user $ u_k $ and the answer $ a_{ij} $, $ w_p=prd(a_{ij},u_k)=1 $ means that all answers provided by the user equally affect the user.
$ P=\{priori\left( v\right) |v\in V\} $ is a set of priori reliability of the node $ v\in V $ and $ priori\left( v\right) \in \left[ 0,1\right] $. The higher the priori reliability is, the more reliable the node is. The priori reliability of the answer $ a_{ij} $ is defined on the share of votes $ fvote_{ij} $ and the frequency of core words $ fcore_{ij} $. $ priori\left( a_{ij}\right) =\alpha fvote_{ij}+\left( 1-\alpha \right) fcore_{ij} $, where $ \alpha $ is the influence coefficient between the share of votes and the frequency of core words. The priori reliability of the user $ priori\left( u_k\right) $ is defined on the reputation, upvotes, downvotes and the homepage views. After the Pearson correlation analysis, we find that the user authority is strongly correlated with the number of the homepage views. Therefore, the normalized user prior reliability is defined as $ priori\left( u_k\right) =Norm\left( \frac{Reputation}{Views}+Upvote-Downvote\right) $.

4 Mutual Inference Principle

After getting AUnet model, the trust propagation algorithm is used to iteratively update the user reliability and answer reliability based on the mutual inference principle.

4.1 User Reliability Computing

Compared to reliable users, unreliable users have higher error rates. So the reliability of user $ u_k $ can be inferred by his/her error rate. Assume that the error rate $ \varepsilon \left( u_k\right) $ of the user $ u_k $ obeys normal distribution, $ \varepsilon \left( u_k\right) \sim N\left( 0,\sigma \left( u_k\right) ^2\right) $.

Our goal is to make $ \varepsilon _{combine}=\frac{\sum _{u_k\in U} \omega \left( u_k\right) \varepsilon \left( u_k\right) }{\sum _{u_k\in U} \omega \left( u_k\right) }$, the variance of the weighted untrustworthiness of all users as small as possible. Since $ \varepsilon _{combine} $ also obey normal distribution $ \varepsilon _{combine}\sim N\left[ 0,\frac{\sum _{u_k\in U} \left( \omega \left( u_k\right) \right) ^2\sigma ^2\left( u_k\right) }{\left( \sum _{u_k\in U} \omega \left( u_k\right) \right) ^2}\right] $. We formulated this goal with the constraint $ \sum _{u_k\in U} \omega \left( u_k\right) =1 $ into the following optimization problem as:

$$\begin{aligned} \begin{aligned} \min \limits _{\{\omega \left( u_k\right) \}}\sum _{u_k\in U} \left( \omega \left( u_k\right) \right) ^2\sigma ^2\left( u_k\right) \\ s.t.\sum _{u_k\in U} \omega \left( u_k\right) =1,\omega \left( u_k\right) >0 \end{aligned} \end{aligned}$$

(1)

The optimization problem is a convex function, which can be solved by the Lagrangian multiplier method with a Lagrangian multiplier $ \lambda $, and the analytical solution is:

$$\begin{aligned} \omega \left( u_k\right) \propto \frac{1}{\sigma ^2\left( u_k\right) } \end{aligned}$$

(2)

In Eq. (2), the true variance $ \sigma ^2\left( u_k\right) $ of user $ u_k $ can be estimated by the maximum likelihood estimation as:

$$\begin{aligned} \hat{\sigma }^2\left( u_k\right) =\frac{1}{\left| Q\left( u_k\right) \right| }\sum _{q\in Q\left( u_k\right) } \left( x_q^{u_k}-x_q^*\right) ^2 \end{aligned}$$

(3)

Equation (3) means the mean of the squared loss of the errors that user $ u_k $ makes. $ x_q^* $ is the best answer for the question q which is computed by the weighted average of the answer reliability $ x_q^*=\frac{\sum _{u_k\in U_q} \tau \left( a_q^{u_k}\right) \cdot x_q^{u_k}}{\sum _{u_k\in U_q} \tau \left( a_q^{u_k}\right) } $.

According to the statistics, most users give less answers, the method to estimate the users theoretical variance $ \sigma ^2\left( u_k\right) $ by $ \hat{\sigma }^2\left( u_k\right) $ will be inaccurate when the user provides small number of answers. We solved this long-tail problem by using confidence interval score instead of a single value reference to the work in [16]. Finally, the answer reliability under a certain confidence can be computed as follows:

$$\begin{aligned} \omega ^\prime \left( u_k\right) \propto -\frac{1}{\sigma ^2\left( u_k\right) }=\frac{\chi _{1-\frac{\alpha }{2}}^2\left( \left| Q\left( u_k\right) \right| \right) }{\sum _{q\in Q\left( u_k\right) } \left( x_q^{u_k}-x_q^*\right) ^2} \end{aligned}$$

(4)

4.2 Answer Reliability Computing

The answer reliability is affected by the user reliability and other peer answers for the same question [10]. For the reliability, we can get an undirected subgraph for a specific question, consisting of the answers and the corresponding user. Then, we transformed the answer reliability problem to the joint probability distribution of nodes in the undirected probabilistic subgraph. For the undirected subgraph with n random variables, the joint probability distribution can be represented as follows:

$$\begin{aligned} P(X)=\frac{1}{Z}\displaystyle \prod _{c\in C}\psi _c\left( X_c\right) \end{aligned}$$

(5)

$$\begin{aligned} Z=\displaystyle \sum _X\displaystyle \prod _{c\in C}\psi _c\left( X_c\right) \end{aligned}$$

(6)

In Eq. (6), $ \psi _c\left( X_c\right) =exp\{-E\left( X_c\right) \} $, and the energy function $ E\left( X_c\right) $ represents the correlation between variables. Based on the Boltzmann Machines, the probability of the hidden variable $ y_{ij} $ of the answer $ a_{ij} $ and the probability of the hidden variable $ y_k $ of the user $ u_k $ are defined as follows:

$$\begin{aligned} P\left( y_{ij}\right)= & {} {\left\{ \begin{array}{ll} \tau \left( a_{ij}\right) , &{} \text{ if } y_{ij}=1 \\ 1-\tau \left( a_{ij}\right) , &{} \text{ if } y_{ij}=0 \end{array}\right. }\\ P\left( y_k\right)= & {} {\left\{ \begin{array}{ll} \omega \left( u_k\right) , &{} \text{ if } y_k=1 \\ 1-\omega \left( u_k\right) , &{} \text{ if } y_k=0 \end{array}\right. } \end{aligned}$$

(7)

Generally, it’s an NP-hard problem to obtain the joint probability distribution on the undirected probabilistic graph [17]. By using the iterated conditional modes ICM [18], we updated the value of the answer node variable in the undirected subgraph step by step based on the idea of gradient ascent as follows:

$$\begin{aligned} P\left( y_{ij}=\eta \right) =P\left( y_k=\eta \right) +\displaystyle \sum _{y_{ij^\prime }\in N\left( y_{ij}\right) }m_{ij^\prime \rightarrow ij}\left( y_{ij}=\eta \right) \end{aligned}$$

(8)

$$\begin{aligned} m_{ij^\prime \rightarrow ij}\left( y_{ij}\right) =\displaystyle \sum _{y_{ij^\prime }}U\left( y_{ij^\prime },y_{ij}\right) P\left( y_{ij^\prime }\right) \end{aligned}$$

(9)

$$\begin{aligned} U\left( y_{ij^\prime },y_{ij}\right) =\left[ sim\left( a_{ij},a_{ij^\prime }\right) \right] ^{I\left( y_{ij^\prime },y_{ij}\right) }\cdot \left[ 1-sim\left( a_{ij},a_{ij^\prime }\right) \right] ^{1-I\left( y_{ij^\prime },y_{ij}\right) } \end{aligned}$$

(10)

We let $ y_{ij^\prime }\in \{0,1\} $ denote the trustiness transmitted by $ a_{ij^\prime } $ to $ a_{ij} $. $ U\left( y_{ij^\prime },y_{ij}\right) $ is the potential function and $ sim\left( a_{ij},a_{ij^\prime }\right) $ denotes the similarity between the answer. When the reliability of similar answers for the same question is consistent, the energy needed by transmission is small and it is easy to happen. In contrast, if the reliability of similar answers for the same question is inconsistent, the energy needed by transmission is big and it is hard to happen.

5 Experiments

5.1 Datasets and Experimental Settings

In order to evaluate the effectiveness of the proposed algorithm in this paper, we conducted experiments on datasets of four domains from the vertical cQA site StackExchange^{Footnote 5}, including coffee, movie, music and sports.

The statistics of the four datasets are shown in the first six column in Table 1. To ensure the quality of our dataset, only the question with more than 3 answers are selected.

Table 1. Experimental data statistics.

Full size table

The dataset of StackExchange only provides the best answer of the question, and doesn’t make any judgement on the reliability of other answers. However, answers in cQA often have diversity, so it’s not objective to directly treat other answers as negative samples which will cause imbalance between positive and negative examples. Therefore, we randomly selected 50 questions from four domains respectively, totaling 200 questions and 1037 answers, and let two volunteers annotate the answer reliability according to the best answer and relevant information. Each volunteer annotated 125 questions and all answers are annotated as “Yes”(reliable) and “No”(not reliable). After verifying the consistency of the labeling results, the final statistics for all areas is shown in the last two columns in Table 1.

All the experiments were conducted over a server equipped with core i7-4790 CPU on 16 GB RAM, four cores and 64-bit Windows 10 operating system.

5.2 Baseline and Metrics

Four methods Vote, LR, TDM and LQ are selected as the comparison in this paper.

Vote, the basic voting method, directly ranks answers according to the number of votes of the answer.
LR, proposed by Shah et al. [6], trains the Logistic regression model based on non-textual information of answers to evaluate and predict the answer reliability in cQA. The output of LR is a trust value of the answer between 0 and 1.
TDM, a method proposed in [19] based on the iterative idea of TruthDiscovery, estimates the trustworthiness of the answer. TDM smoothes the long-tail user with the priori reliability of the user, and it uses basic iterative methods to update the user reliability and the answer reliability.
LQ, an unsupervised answer reliability evaluation method, is proposed in [10], which detects the low quality answer using the relation between peer answers and label answers through minimizing the variance of the question. We represented the answer by 121 relevant features categorized in 5 types including the statistical characteristics and textual features of the answer, user features and similar features between peer answers.

For the evaluation of answer reliability, we focused on whether the model can effectively filter and return reliable answers, that is, whether the top few answers in the answer list presented to the user are more reliable. Therefore, we evaluated the performance of five models using the two indicators MRR and MAP which are commonly used in information retrieval and question-answering.

MRR (Mean Reciprocal Rank) measures the average of the reciprocal of the position of the best answer in the answer list, which is defined as follows:

$$\begin{aligned} MRR=\frac{1}{\left| Q\right| } \displaystyle \sum _{q\in Q} \frac{1}{bp_q} \end{aligned}$$

(11)

where $ \left| Q\right| $ is the total number of questions, and $ bp_q $ is the position of the most reliable answer in the answer list. MRR can evaluate whether the algorithm can effectively filter out the best answer.

MAP (Mean Average of Precision) measures the average accuracy of the ranking of answers for each question. That is to say, not only the position of the most reliable answer, but also the position of other reliable answers in the final ranking result are measured. MAP is defined as follows:

$$\begin{aligned} MAP=\frac{1}{\left| Q\right| } \displaystyle \sum _{q\in Q}\left( \frac{1}{TN_q}\displaystyle \sum _{i=1}^{TN} \frac{i}{p_i}\right) \end{aligned}$$

(12)

where $ TN_q $ is the number of reliable answers labeled as positive samples of the question q, and $ p_i $ is the position of the ith reliable answer in the final ranking result.

5.3 Performance and Results Analysis

The main parameters of our AUnet method are the window size of sen2vec and $ \alpha $ in calculating the answer reliability. The DM model of sen2vec is adopted to represent the answer as a 300-dimensional vector, and the window size is 5. After experiments, the best value of $ \alpha $ in coffee domain is 0.6, in movie domain is 0.7, in music domain is 0.6 and in sports domain is 0.5.

We firstly verified the convergence of the algorithm. Figure 2 shows the change in the cumulative value of the answer reliability with iterations in each iteration. When the reliability change of each answer between two iterations is less than 0.001, the algorithm is considered to have reached a steady state.

It can be seen in Fig. 2 that the data of four domains all reach a steady state after 15 iterations in the experiment. Among them, the convergence speed of data of the Music domain is obviously faster than other domain. This is because the number of per capita answers of the Music domain is relatively large, and the number of answers under each question is also large.

The MRR and MAP of five models in four domains are shown in Table 2.

Table 2. MRR(%) and MAP of five models in four domains

Full size table

From the experimental results of MRR, we can see that using the voting method alone can filter out about 70% of the best answers. After adding user information and statistical information, the trained LR method can filter out about 80% of the best answers. To improve the ability to filter the best answers in the case of few votes to a certain extent, TDM smooths the long-tail users and LQ introduces similarity relation between peer answers. AUnet achieves the best screening ability in all four areas, and it can effectively return the best answer of more than 86% of problems.

The experimental results of MAP count the average sorting accuracy of all questions in each domain, and it measures the ability and accuracy of the algorithm for returning reliable answers.

On the whole, the average ranking performance of Vote which only considers the number of votes is the worst. This is because the number of votes can be affected by factors such as release time and malicious voting, and the reliability of the answer cannot be effectively evaluated without considering the influence of other factors. Because in addition to the community features, the statistical features of the answer and user features are also considered, the performance of LR is slightly improved on the basis of Vote. However, because a large part of answers are long-tail users with less number of votes, the prediction result for the answer with sparse features is poor in LR. This can cause some reliable answers to be sorted backwards, so except Movie, the MAP value of LR in other three domains are below 80%. TDM uses the iterative method to evaluate the reliability of the answer, and the value of MAP is about 83%, which is stable and unaffected by community information. LQ introduces the similarity features and the textual features of the answer on the basis of LR, which can effectively filter the low-quality answer, so the MAP value has greatly improved compared to the other three methods. AUnet models the relation between the user and the answer simultaneously, and utilizes the priori reliability based on the community information and the statistical information, achieving the highest average sorting accuracy in all four domains. For 90% questions, the top three answers returned by AUnet are all reliable. In addition, AUnet achieves the largest performance improvement in the Music domain. That is to say, when the number of answers to the question and the number of answers per capita are large, evaluating answer reliability by AUnet is significantly better than characteristic methods.

6 Conclusion

To alleviate the high cost of labeling data in supervised methods and the low performance in unsupervised methods, we proposed an unsupervised method based on AUnet to evaluate the answer reliability in this paper. On the basis of the probabilistic graphical model and the mutual inference algorithm, our AUnet method can calculate the answer reliability and user reliability simultaneously without supervision and automatically rank the answer in cQA. Results of experiments on four domains in StackExchange verified the convergence and effectiveness of our algorithm and showed our method is superior to other methods in the screening ability of the best answer and the ability to discriminate between reliable and unreliable answers. The potential direction for future research if focusing on evaluating the answer reliability under the multi-source conflict.

Notes

References

Yao, Y., Tong, H., Xie, T., Akoglu, L., Xu, F., Lu, J.: Detecting high-quality posts in community question answering sites. Inf. Sci. 302(C), 70–82 (2015)
Article Google Scholar
Ko, J., Nyberg, E., Luo, S.: A probabilistic graphical model for joint answer ranking in question answering. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 343–350 (2007)
Google Scholar
Nie, L., Wei, X., Zhang, D., Wang, X., Gao, Z., Yang, Y.: Datadriven answer selection in community qa systems. IEEE Trans. Knowl. Data Eng. 29(6), 1186–1198 (2017)
Article Google Scholar
Tymoshenko, K., Bonadiman, D., Moschitti, A.: Learning to rank non-factoid answers: comment selection in web forums. In: ACM International on Conference on Information and Knowledge Management, pp. 2049–2052 (2016)
Google Scholar
Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predict the quality of answers with non-textual features. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 228–235 (2006)
Google Scholar
Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community qa. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 411–418 (2010)
Google Scholar
Dalip, D.H., Cristo, M., Calado, P.: Exploiting user feedback to learn to rank answers in q & a forums: a case study with stack overflow. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 543–552 (2013)
Google Scholar
Zhang, J., Ackerman, M.S., Adamic, L.: Expertise networks in online communities: structure and algorithms. In: International Conference on World Wide Web, pp. 221–230 (2007)
Google Scholar
Jurczyk, P., Agichtein, E.: Discovering authorities in question answer communities by using link analysis, pp. 919–922 (2007)
Google Scholar
Wu, H., Tian, Z., Wu, W., Chen, E.: An unsupervised approach for low-quality answer detection in community question-answering. In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10178, pp. 85–101. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55699-4_6
Chapter Google Scholar
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov. 29(3), 626–688 (2014)
Article MathSciNet Google Scholar
Akoglu, L., Chandy, R., Faloutsos, C.: Opinion fraud detection in online reviews by network effects, pp. 2–11 (2013)
Google Scholar
Li, H., Mukherjee, A., Liu, B., Kornfield, R., Emery, S.: Detecting campaign promoters on twitter using markov random fields. In: IEEE International Conference on Data Mining, pp. 290–299 (2014)
Google Scholar
Fogg, B.J., Tseng, H.: The elements of computer credibility. In: Proceeding of the CHI ’99 Conference on Human Factors in Computing Systems: the CHI Is the Limit, Pittsburgh, Pa, USA, pp. 80–87 (May 1999)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents, vol. 4, pp. II-1188 (2014)
Google Scholar
Li, Q., et al.: A confidence-aware approach for truth discovery on long-tail data. Proc. Vldb Endow. 8(4), 425–436 (2014)
Article Google Scholar
Liu, W., Liu, J., Duan, H., Hu, W., Wei, B.: Exploiting source-object networks to resolve object conflicts in linked data. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 53–67. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_4
Chapter Google Scholar
Kittler, J., Hater, M., Duin, R.P.W.: Combining classifiers. In: International Conference on Pattern Recognition, vol. 2, pp. 897–901 (1998)
Google Scholar
Li, Y., et al.: Extracting medical knowledge from crowdsourced question answering website. IEEE Trans. Big Data PP(99), 1–1 (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by National Key Research and Development Program of China (2018YFB1004500), Science and Technology Planning Project of Guangdong Province, China (2017A010101029), National Natural Science Foundation of China (61532015, 61532004, 61672419, and 61672418), Innovative Research Group of the National Natural Science Foundation of China(61721002), Innovation Research Team of Ministry of Education (IRT_17R86), Project of China Knowledge Centre for Engineering Science and Technology, and Teaching Reform Project of XJTU (No. 17ZX044).

Author information

Authors and Affiliations

Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Tech. R&D, Xi’an Jiaotong University, Xi’an, 710049, Shaanxi, China
Ruoqing Ren, Haimeng Duan, Wenqiang Liu & Jun Liu
School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710049, Shaanxi, China
Ruoqing Ren, Haimeng Duan & Wenqiang Liu
Guangdong Xian Jiaotong University Academy, Xi’an, 528300, China
Jun Liu

Authors

Ruoqing Ren
View author publications
You can also search for this author in PubMed Google Scholar
Haimeng Duan
View author publications
You can also search for this author in PubMed Google Scholar
Wenqiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruoqing Ren .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U
Education University of Hong Kong, Hong Kong, China
Haoran Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, R., Duan, H., Liu, W., Liu, J. (2018). AUnet: An Unsupervised Method for Answer Reliability Evaluation in Community QA Systems. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-01298-4_24
Published: 21 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01297-7
Online ISBN: 978-3-030-01298-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics