Abstract
Community Question Answering (CQA) sites such as Yahoo! Answers provide rich knowledge for people to access. However, the quality of answers posted to CQA sites often varies a lot from precise and useful ones to irrelevant and useless ones. Hence, automatic detection of low-quality answers will help the site managers efficiently organize the accumulated knowledge and provide high-quality contents to users. In this paper, we propose a novel unsupervised approach to detect low-quality answers at a CQA site. The key ideas in our model are: (1) most answers are normal; (2) low-quality answers can be found by checking its “peer” answers under the same question; (3) different questions have different answer quality criteria. Based on these ideas, we devise an unsupervised learning algorithm to assign soft labels to answers as quality scores. Experiments show that our model significantly outperforms the other state-of-the-art models on answer quality prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We set \(\epsilon =0.00001\) and \(N=200\) in our experiments.
- 3.
- 4.
Features of Q_u_other and A_u_other in Table 2 are only traceable in Yahoo dataset.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
c: trade-off between training error and margin. j: cost-factor of training errors difference between positive and negative examples. b: use biased hyperplane or not.
- 15.
To save space we only report the results on Qatar dataset. The results in terms of Fatwa and Yahoo have similar trends.
- 16.
“Non-English” and “Other” answers are categorized into “Irrelevant” answers.
References
Berger, A., et al.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR 2000 (2000)
Blei, D.M., et al.: Latent Dirichlet allocation. In: NIPS 2001 (2001)
Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)
Crawford, M., et al.: Survey of review spam detection using machine learning techniques. J. Big Data 2(1), 23 (2015)
Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL 2014 (2014)
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Jeon, J., et al.: A framework to predict the quality of answers with non-textual features. In: SIGIR 2006 (2006)
Jindal, N., Liu, B.: Review spam detection. In: WWW 2007, pp. 1189–1190 (2007)
Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL 2003 (2003)
Li, F., et al.: Learning to identify review spam. In: IJCAI 2011 (2011)
Liu, W., et al.: Unsupervised one-class learning for automatic outlier removal. In: CVPR 2014 (2014)
Lyon, C., et al.: Detecting short passages of similar text in large document collections. In: EMNLP 2001, pp. 118–125 (2001)
Mikolov, T., et al.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
Nakov, P., et al.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)
Nakov, P., et al.: Semeval-2016 task 3: community question answering. In: SemEval@NAACL-HLT 2016, pp. 525–545 (2016)
Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004 (2004)
Nicosia, M.Q., et al.: QCRI: answer selection for community question answering - experiments for arabic and english. In: SemEval@NAACL-HLT 2015 (2015)
Radev, D.R., et al.: Evaluating web-based question answering systems. In: LREC’s 2002 (2002)
Sakai, T., et al.: Using graded-relevance metrics for evaluating community QA answer selection. In: WSDM 2011 (2011)
Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR 2010 (2010)
Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
Tran, Q.H., et al.: JAIST: combining multiple features for answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)
Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: SIGCSE 1996, pp. 130–134 (1996)
Xia, Y., et al.: Learning discriminative reconstructions for unsupervised outlier removal. In: ICCV 2015 (2015)
Acknowledgements
This research was partially supported by grants from the National Key Research and Development Program of China (Grant No. 2016YFB1000904), the National Science Foundation for Distinguished Young Scholars of China (Grant No. 61325010), the National Natural Science Foundation of China (Grant No. 61672483), and the Fundamental Research Funds for the Central Universities of China (Grant No. WK2350000001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wu, H., Tian, Z., Wu, W., Chen, E. (2017). An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-55699-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55698-7
Online ISBN: 978-3-319-55699-4
eBook Packages: Computer ScienceComputer Science (R0)