An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering

Wu, Haocheng; Tian, Zuohui; Wu, Wei; Chen, Enhong

doi:10.1007/978-3-319-55699-4_6

Haocheng Wu¹⁸,
Zuohui Tian¹⁹,
Wei Wu²⁰ &
…
Enhong Chen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2566 Accesses
4 Citations

Abstract

Community Question Answering (CQA) sites such as Yahoo! Answers provide rich knowledge for people to access. However, the quality of answers posted to CQA sites often varies a lot from precise and useful ones to irrelevant and useless ones. Hence, automatic detection of low-quality answers will help the site managers efficiently organize the accumulated knowledge and provide high-quality contents to users. In this paper, we propose a novel unsupervised approach to detect low-quality answers at a CQA site. The key ideas in our model are: (1) most answers are normal; (2) low-quality answers can be found by checking its “peer” answers under the same question; (3) different questions have different answer quality criteria. Based on these ideas, we devise an unsupervised learning algorithm to assign soft labels to answers as quality scores. Experiments show that our model significantly outperforms the other state-of-the-art models on answer quality prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://answers.yahoo.com/question/index?qid=20090408172834AArbCtu.
2.
We set \(\epsilon =0.00001\) and \(N=200\) in our experiments.
3.
http://alt.qcri.org/semeval2015/task3.
4.
Features of Q_u_other and A_u_other in Table 2 are only traceable in Yahoo dataset.
5.
http://developer.yahoo.com/answers.
6.
http://lucene.apache.org/.
7.
http://www.ranks.nl/stopwords.
8.
http://gibbslda.sourceforge.net/.
9.
https://code.google.com/archive/p/word2vec/.
10.
http://www.statmt.org/moses/giza/GIZA++.html.
11.
http://nlp.stanford.edu/software/tagger.shtml.
12.
http://nlp.stanford.edu/software/lex-parser.shtml.
13.
http://www.cs.cmu.edu/~alavie/METEOR/.
14.
c: trade-off between training error and margin. j: cost-factor of training errors difference between positive and negative examples. b: use biased hyperplane or not.
15.
To save space we only report the results on Qatar dataset. The results in terms of Fatwa and Yahoo have similar trends.
16.
“Non-English” and “Other” answers are categorized into “Irrelevant” answers.

References

Berger, A., et al.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR 2000 (2000)
Google Scholar
Blei, D.M., et al.: Latent Dirichlet allocation. In: NIPS 2001 (2001)
Google Scholar
Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)
Google Scholar
Crawford, M., et al.: Survey of review spam detection using machine learning techniques. J. Big Data 2(1), 23 (2015)
Article Google Scholar
Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL 2014 (2014)
Google Scholar
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Article MATH Google Scholar
Jeon, J., et al.: A framework to predict the quality of answers with non-textual features. In: SIGIR 2006 (2006)
Google Scholar
Jindal, N., Liu, B.: Review spam detection. In: WWW 2007, pp. 1189–1190 (2007)
Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)
Book Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL 2003 (2003)
Google Scholar
Li, F., et al.: Learning to identify review spam. In: IJCAI 2011 (2011)
Google Scholar
Liu, W., et al.: Unsupervised one-class learning for automatic outlier removal. In: CVPR 2014 (2014)
Google Scholar
Lyon, C., et al.: Detecting short passages of similar text in large document collections. In: EMNLP 2001, pp. 118–125 (2001)
Google Scholar
Mikolov, T., et al.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
Google Scholar
Nakov, P., et al.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)
Google Scholar
Nakov, P., et al.: Semeval-2016 task 3: community question answering. In: SemEval@NAACL-HLT 2016, pp. 525–545 (2016)
Google Scholar
Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004 (2004)
Google Scholar
Nicosia, M.Q., et al.: QCRI: answer selection for community question answering - experiments for arabic and english. In: SemEval@NAACL-HLT 2015 (2015)
Google Scholar
Radev, D.R., et al.: Evaluating web-based question answering systems. In: LREC’s 2002 (2002)
Google Scholar
Sakai, T., et al.: Using graded-relevance metrics for evaluating community QA answer selection. In: WSDM 2011 (2011)
Google Scholar
Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR 2010 (2010)
Google Scholar
Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
Google Scholar
Tran, Q.H., et al.: JAIST: combining multiple features for answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)
Google Scholar
Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: SIGCSE 1996, pp. 130–134 (1996)
Google Scholar
Xia, Y., et al.: Learning discriminative reconstructions for unsupervised outlier removal. In: ICCV 2015 (2015)
Google Scholar

Download references

Acknowledgements

This research was partially supported by grants from the National Key Research and Development Program of China (Grant No. 2016YFB1000904), the National Science Foundation for Distinguished Young Scholars of China (Grant No. 61325010), the National Natural Science Foundation of China (Grant No. 61672483), and the Fundamental Research Funds for the Central Universities of China (Grant No. WK2350000001).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Haocheng Wu & Enhong Chen
Harbin Institute of Technology, Harbin, China
Zuohui Tian
Microsoft Research, Beijing, China
Wei Wu

Authors

Haocheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zuohui Tian
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Enhong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enhong Chen .

Editor information

Editors and Affiliations

Arizona State University, Tempe - Phoenix, Arizona, USA
Selçuk Candan
of Science and Technology, Hong Kong University of Science and Technology, Hong Kong, China
Lei Chen
Aalborg University , Aalborg, Denmark
Torben Bach Pedersen
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H., Tian, Z., Wu, W., Chen, E. (2017). An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-55699-4_6
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55698-7
Online ISBN: 978-3-319-55699-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics