Skip to main content

To Answer or Not to Answer? Filtering Questions for QA Systems

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

Question answering (QA) systems are usually structured as strict conditional generators, which return an answer for every input question. Sometimes, however, the policy of always responding to questions may prove itself harmful, given the possibility of giving inaccurate answers, particularly for ambiguous or sensitive questions; instead, it may be better for a QA system to decide which questions should be answered or not. In this paper, we explore dual system architectures that filter unanswerable or meaningless questions, thus answering only a subset of the questions raised. Two experiments are performed in order to evaluate this modular approach: a classification on SQuAD 2.0 for unanswerable questions, and a regression on Pirá for question meaningfulness. Despite the difficulties involved in the tasks, we show that filtering questions may contribute to improve the quality of the answers generated by QA systems. By using classification and regression models to filter questions, we can get better control over the accuracy of the answers produced by the answerer systems.

This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by the São Paulo Research Foundation (FAPESP grant #2019/ 07665-4) and by the IBM Corporation. Fabio G. Cozman thanks the support of the National Council for Scientific and Technological Development of Brazil (CNPq grant #312180/2018-7).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In order to assure reproducibility, codes, dataset partitions, and trained models are made available at the project’s GitHub repository: https://github.com/C4AI/Pira/tree/main/Triggering.

  2. 2.

    In Pirá, only QA sets with meaningful evaluations were used. For the original dataset, the numbers would be: train: 1896 (79.98%), validation: 225 (9.96%), test: 227 (10%), total: 2258 (100%).

  3. 3.

    F1-score is implemented with the official SQuAD script. Available at: https://rajpurkar.github.io/SQuAD-explorer/.

  4. 4.

    Both the classifiers described in this section and the regressors trained in the next use random initializations that may resul in slightly different predictions. To ensure the consistency of results, we repeated the same experiment 10 times each. The results described here are, therefore, representative of the trained models.

References

  1. Acheampong, K.N., Tian, W., Sifah, E.B., Opuni-Boachie, K.O.-A.: The emergence, advancement and future of textual answer triggering. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2020. AISC, vol. 1229, pp. 674–693. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52246-9_50

    Chapter  Google Scholar 

  2. Paschoal, A.F.A., et al.: Pirá: a bilingual portuguese-english dataset for question-answering about the ocean. In: 30th ACM International Conference on Information and Knowledge Management (CIKM 2021) (2021). https://doi.org/10.1145/3459637.3482012

  3. Brown, T.B., et al.: Language models are few-shot learners. CoRR abs/2005.14165 (2020). https://arxiv.org/abs/2005.14165

  4. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: Barzilay, R., Kan, M. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, 30 July–4 August, Volume 1: Long Papers, pp. 1870–1879. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1171

  5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  6. European-Commission: Proposal for a regulation laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts (2021). https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206 &from=EN#footnote8

  7. Ferrucci, D.A.: Introduction to “this is watson’’. IBM J. Res. Dev. 56(3), 1 (2012). https://doi.org/10.1147/JRD.2012.2184356

    Article  Google Scholar 

  8. Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: a new benchmark for selection-based question answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, San Jose, CA, USA, 6–8 November 2016, pp. 820–827. IEEE Computer Society (2016). https://doi.org/10.1109/ICTAI.2016.0128

  9. Kadavath, S., et al.: Language models (mostly) know what they know (2022). https://arxiv.org/abs/2207.05221

  10. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020, pp. 6769–6781. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550

  11. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=H1eA7AEtvS

  12. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703

  13. Lewis, P.S.H., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6–12 December 2020, virtual (2020)

    Google Scholar 

  14. Liu, C., Lowe, R., Serban, I., Noseworthy, M., Charlin, L., Pineau, J.: How NOT to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016, pp. 2122–2132. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/d16-1230

  15. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692

  16. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  17. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020). http://jmlr.org/papers/v21/20-074.html

  18. Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/P18-2124

  19. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100, 000+ questions for machine comprehension of text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, Austin, Texas, USA, 1–4 November 2016, pp. 2383–2392. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/d16-1264

  20. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 3980–3990. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1410

  21. Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting closer to AI complete question answering: a set of prerequisite real tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020, pp. 8722–8731. AAAI Press (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398

  22. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). http://arxiv.org/abs/1910.01108

  23. Thoppilan, R., et al.: LaMDA: language models for dialog applications (2022)

    Google Scholar 

  24. Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: a large scale Chinese reading comprehension dataset on opinion. CoRR abs/2006.12146 (2020). https://arxiv.org/abs/2006.12146

  25. Welbl, J., et al.: Challenges in detoxifying language models. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2447–2469. Association for Computational Linguistics, November 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.210

  26. Yang, Y., Yih, W.T., Meek, C.: WikiQA: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2013–2018. Association for Computational Linguistics, Lisbon, September 2015. https://doi.org/10.18653/v1/D15-1237

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paulo Pirozelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G. (2022). To Answer or Not to Answer? Filtering Questions for QA Systems. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21689-3_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21688-6

  • Online ISBN: 978-3-031-21689-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics