Abstract
Taking advantage of the rapid growth of community platforms, such as Yahoo Answers, Quora, etc., Community Question Answering (CQA) systems are developed to retrieve semantically equivalent questions when users raise a new query. A typical CQA system mainly consists of two key components, a retrieval model and a ranking model, to search for similar questions and select the most related, respectively. In this paper, we propose LARQ, Learning to Ask and Rewrite Questions, which is a novel sentence-level data augmentation method. Different from common lexical-level data augmentation progresses, we take advantage of the Question Generation (QG) model to obtain more accurate, diverse, and semantically-rich query examples. Since the queries differ greatly in a low-resource code-start scenario, incorporating the QG model as an augmentation to the indexed collection significantly improves the response rate of CQA systems. We incorporate LARQ in an online CQA system and the Bank Question (BQ) Corpus to evaluate the enhancements for both the retrieval process and the ranking model. Extensive experimental results show that the LARQ enhanced model significantly outperforms single BERT and XGBoost models, as well as a widely-used QG model (NQG).
Keywords
H. Zhou and H. Liu—Equal Contributions.
H. Liu—Work done during an internship at Tencent.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahmad, A., Constant, N., Yang, Y., Cer, D.: ReQA: an evaluation for end-to-end answer retrieval models. In: EMNLP 2019 MRQA Workshop (2019). https://doi.org/10.18653/v1/d19-5819
Alberti, C., Andor, D., Pitler, E., Devlin, J., Collins, M.: Synthetic QA corpora generation with roundtrip consistency. In: ACL (2019). https://doi.org/10.18653/v1/p19-1620
Bonadiman, D., Kumar, A., Mittal, A.: Large scale question paraphrase retrieval with smoothed deep metric learning. In: W-NUT Workshop (2019). https://doi.org/10.18653/v1/d19-5509
Borisov, A., Markov, I., de Rijke, M., Serdyukov, P.: A neural click model for web search. In: WWW. ACM (2016). https://doi.org/10.1145/2872427.2883033
Chen, J., Chen, Q., Liu, X., Yang, H., Lu, D., Tang, B.: The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: EMNLP (2018). https://doi.org/10.18653/v1/d18-1536
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: SIGKDD. ACM (2016). https://doi.org/10.1145/2939672.2939785
Chirita, P.A., Nejdl, W., Paiu, R., Kohlschütter, C.: Using ODP metadata to personalize search. In: SIGIR. ACM (2005). https://doi.org/10.1145/1076034.1076067
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. arXiv:1906.08101 (2019)
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching N-grams in ad-hoc search. In: WSDM. ACM (2018). https://doi.org/10.1145/3159652.3159659
Dehghani, M., Zamani, H., Severyn, A., Kamps, J., Croft, W.B.: Neural ranking models with weak supervision. In: SIGIR. ACM (2017). https://doi.org/10.1145/3077136.3080832
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Dietz, L., Verma, M., Radlinski, F., Craswell, N.: TREC complex answer retrieval overview. In: TREC (2017)
Feng, M., Xiang, B., Glass, M.R., Wang, L., Zhou, B.: Applying deep learning to answer selection: a study and an open task. In: Workshop on ASRU. IEEE (2015). https://doi.org/10.1109/asru.2015.7404872
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: ACL (2016). https://doi.org/10.18653/v1/p16-1154
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: CIKM. ACM (2016). https://doi.org/10.1145/2983323.2983769
Hawking, D.: Challenges in enterprise search. In: ADC, vol. 4. Citeseer (2004)
Jing, F., Zhang, Q.: Knowledge-enhanced attentive learning for answer selection in community question answering systems. arXiv:1912.07915 (2019)
Kumar, A., Dandapat, S., Chordia, S.: Translating web search queries into natural language questions. In: LREC (2018)
Lewis, P., Denoyer, L., Riedel, S.: Unsupervised question answering by cloze translation. In: ACL (2019). https://doi.org/10.18653/v1/p19-1484
Liu, X., et al.: LCQMC: a large-scale Chinese question matching corpus. In: COLING (2018)
Ma, J., Korotkov, I., Yang, Y., Hall, K., McDonald, R.: Zero-shot neural retrieval via domain-targeted synthetic query generation. arXiv:2004.14503 (2020)
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331317
Nguyen, M.T., Phan, V.A., Nguyen, T.S., Nguyen, M.L.: Learning to rank questions for community question answering with ranking SVM. arXiv:1608.04185 (2016)
Rücklé, A., Moosavi, N.S., Gurevych, I.: Neural duplicate question detection without labeled training data. In: EMNLP-IJCNLP (2019). https://doi.org/10.18653/v1/d19-1171
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publication 109 (1995)
Sakata, W., Shibata, T., Tanaka, R., Kurohashi, S.: FAQ retrieval using query-question similarity and BERT-based query-answer relevance. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331326
Sen, B., Gopal, N., Xue, X.: Support-BERT: predicting quality of question-answer pairs in MSDN using deep bidirectional transformer. arXiv:2005.08294 (2020)
Simpson, E., Gao, Y., Gurevych, I.: Interactive text ranking with Bayesian optimisation: a case study on community QA and summarisation. arXiv:1911.10183 (2019)
Tan, M., Santos, C.D., Xiang, B., Zhou, B.: LSTM-based deep learning models for non-factoid answer selection. arXiv:1511.04108 (2015)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: DICTA. IEEE (2016). https://doi.org/10.1109/dicta.2016.7797091
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144 (2016)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR. ACM (2008). https://doi.org/10.1145/1390334.1390416
Yang, W., Zhang, H., Lin, J.: Simple applications of BERT for Ad Hoc document retrieval. arXiv:1903.10972 (2019)
Zamani, H., Dehghani, M., Croft, W.B., Learned-Miller, E., Kamps, J.: From neural re-ranking to neural ranking: learning a sparse representation for inverted indexing. In: CIKM. ACM (2018). https://doi.org/10.1145/3269206.3271800
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Yu. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 662–671. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_56
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, H., Liu, H., Yan, Z., Cao, Y., Li, Z. (2020). LARQ: Learning to Ask and Rewrite Questions for Community Question Answering. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)