Skip to main content

An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10178))

Included in the following conference series:

Abstract

Community Question Answering (CQA) sites such as Yahoo! Answers provide rich knowledge for people to access. However, the quality of answers posted to CQA sites often varies a lot from precise and useful ones to irrelevant and useless ones. Hence, automatic detection of low-quality answers will help the site managers efficiently organize the accumulated knowledge and provide high-quality contents to users. In this paper, we propose a novel unsupervised approach to detect low-quality answers at a CQA site. The key ideas in our model are: (1) most answers are normal; (2) low-quality answers can be found by checking its “peer” answers under the same question; (3) different questions have different answer quality criteria. Based on these ideas, we devise an unsupervised learning algorithm to assign soft labels to answers as quality scores. Experiments show that our model significantly outperforms the other state-of-the-art models on answer quality prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://answers.yahoo.com/question/index?qid=20090408172834AArbCtu.

  2. 2.

    We set \(\epsilon =0.00001\) and \(N=200\) in our experiments.

  3. 3.

    http://alt.qcri.org/semeval2015/task3.

  4. 4.

    Features of Q_u_other and A_u_other in Table 2 are only traceable in Yahoo dataset.

  5. 5.

    http://developer.yahoo.com/answers.

  6. 6.

    http://lucene.apache.org/.

  7. 7.

    http://www.ranks.nl/stopwords.

  8. 8.

    http://gibbslda.sourceforge.net/.

  9. 9.

    https://code.google.com/archive/p/word2vec/.

  10. 10.

    http://www.statmt.org/moses/giza/GIZA++.html.

  11. 11.

    http://nlp.stanford.edu/software/tagger.shtml.

  12. 12.

    http://nlp.stanford.edu/software/lex-parser.shtml.

  13. 13.

    http://www.cs.cmu.edu/~alavie/METEOR/.

  14. 14.

    c: trade-off between training error and margin. j: cost-factor of training errors difference between positive and negative examples. b: use biased hyperplane or not.

  15. 15.

    To save space we only report the results on Qatar dataset. The results in terms of Fatwa and Yahoo have similar trends.

  16. 16.

    “Non-English” and “Other” answers are categorized into “Irrelevant” answers.

References

  1. Berger, A., et al.: Bridging the lexical chasm: statistical approaches to answer-finding. In: SIGIR 2000 (2000)

    Google Scholar 

  2. Blei, D.M., et al.: Latent Dirichlet allocation. In: NIPS 2001 (2001)

    Google Scholar 

  3. Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)

    Google Scholar 

  4. Crawford, M., et al.: Survey of review spam detection using machine learning techniques. J. Big Data 2(1), 23 (2015)

    Article  Google Scholar 

  5. Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL 2014 (2014)

    Google Scholar 

  6. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  7. Jeon, J., et al.: A framework to predict the quality of answers with non-textual features. In: SIGIR 2006 (2006)

    Google Scholar 

  8. Jindal, N., Liu, B.: Review spam detection. In: WWW 2007, pp. 1189–1190 (2007)

    Google Scholar 

  9. Joachims, T.: Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, New York (2002)

    Book  Google Scholar 

  10. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL 2003 (2003)

    Google Scholar 

  11. Li, F., et al.: Learning to identify review spam. In: IJCAI 2011 (2011)

    Google Scholar 

  12. Liu, W., et al.: Unsupervised one-class learning for automatic outlier removal. In: CVPR 2014 (2014)

    Google Scholar 

  13. Lyon, C., et al.: Detecting short passages of similar text in large document collections. In: EMNLP 2001, pp. 118–125 (2001)

    Google Scholar 

  14. Mikolov, T., et al.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)

    Google Scholar 

  15. Nakov, P., et al.: Semeval-2015 task 3: answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)

    Google Scholar 

  16. Nakov, P., et al.: Semeval-2016 task 3: community question answering. In: SemEval@NAACL-HLT 2016, pp. 525–545 (2016)

    Google Scholar 

  17. Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004 (2004)

    Google Scholar 

  18. Nicosia, M.Q., et al.: QCRI: answer selection for community question answering - experiments for arabic and english. In: SemEval@NAACL-HLT 2015 (2015)

    Google Scholar 

  19. Radev, D.R., et al.: Evaluating web-based question answering systems. In: LREC’s 2002 (2002)

    Google Scholar 

  20. Sakai, T., et al.: Using graded-relevance metrics for evaluating community QA answer selection. In: WSDM 2011 (2011)

    Google Scholar 

  21. Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR 2010 (2010)

    Google Scholar 

  22. Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)

    Google Scholar 

  23. Tran, Q.H., et al.: JAIST: combining multiple features for answer selection in community question answering. In: SemEval@NAACL-HLT 2015 (2015)

    Google Scholar 

  24. Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: SIGCSE 1996, pp. 130–134 (1996)

    Google Scholar 

  25. Xia, Y., et al.: Learning discriminative reconstructions for unsupervised outlier removal. In: ICCV 2015 (2015)

    Google Scholar 

Download references

Acknowledgements

This research was partially supported by grants from the National Key Research and Development Program of China (Grant No. 2016YFB1000904), the National Science Foundation for Distinguished Young Scholars of China (Grant No. 61325010), the National Natural Science Foundation of China (Grant No. 61672483), and the Fundamental Research Funds for the Central Universities of China (Grant No. WK2350000001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enhong Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wu, H., Tian, Z., Wu, W., Chen, E. (2017). An Unsupervised Approach for Low-Quality Answer Detection in Community Question-Answering. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55699-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55698-7

  • Online ISBN: 978-3-319-55699-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics