skip to main content
10.1145/3477495.3531926acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A Non-Factoid Question-Answering Taxonomy

Published: 07 July 2022 Publication History

Abstract

Non-factoid question answering (NFQA) is a challenging and under-researched task that requires constructing long-form answers, such as explanations or opinions, to open-ended non-factoid questions - NFQs. There is still little understanding of the categories of NFQs that people tend to ask, what form of answers they expect to see in return, and what the key research challenges of each category are.
This work presents the first comprehensive taxonomy of NFQ categories and the expected structure of answers. The taxonomy was constructed with a transparent methodology and extensively evaluated via crowdsourcing. The most challenging categories were identified through an editorial user study. We also release a dataset of categorised NFQs and a question category classifier.
Finally, we conduct a quantitative analysis of the distribution of question categories using major NFQA datasets, showing that the NFQ categories that are the most challenging for current NFQA systems are poorly represented in these datasets. This imbalance may lead to insufficient system performance for challenging categories. The new taxonomy, along with the category classifier, will aid research in the area, helping to create more balanced benchmarks and to focus models on addressing specific categories.

References

[1]
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4--8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2623--2631. https://doi.org/10.1145/3292500.3330701
[2]
Javier Artiles, Enrique Amigó, and Julio Gonzalo. 2009. The role of named entities in Web People Search. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6--7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, 534--542. https: //aclanthology.org/D09--1056/
[3]
Johannes Bjerva, Nikita Bhutani, Behzad Golshan, Wang-Chiew Tan, and Isabelle Augenstein. [n.d.]. SubjQA: A Dataset for Subjectivity and Review Comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020--11). Association for Computational Linguistics, 5480--5494. https://doi.org/10.18653/v1/2020.emnlp-main.442
[4]
Andrei Broder. 2002. A Taxonomy of Web Search. SIGIR Forum 36, 2 (Sept. 2002), 3--10. https://doi.org/10.1145/792550.792552
[5]
Fan Bu, Xingwei Zhu, Yu Hao, and Xiaoyan Zhu. 2010. Function-Based Question Classification for General QA. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Cambridge, MA, 1119--1128. https://www.aclweb.org/anthology/D10--1109
[6]
John Burger, Claire Cardie, Vinay Chaudhri, Robert Gaizauskas, Sanda Harabagiu, David Israel, Christian Jacquemin, Chin-Yew Lin, Steve Maiorano, George Miller, Dan Moldovan, Bill Ogden, John Prager, Ellen Riloff, Amit Singhal, Rohini Shrihari, Tomek Strazalkowski, Ellen Voorhees, and Ralph Weishedel. 2003. Issues, Tasks and Program Structures to Roadmap Research in Question Answering (QA). In Document Understanding Conference. NIST, NIST. https://www.microsoft.com/en-us/research/publication/issues-tasks-andprogram-structures-to-roadmap-research-in-question-answering/
[7]
Ricardo J. G. B. Campello, Davoud Moulavi, and Jörg Sander. 2013. Density-Based Clustering Based on Hierarchical Density Estimates. In Advances in Knowledge Discovery and Data Mining, 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14--17, 2013, Proceedings, Part II (Lecture Notes in Computer Science), Jian Pei, Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu (Eds.), Vol. 7819. Springer, 160--172. https://doi.org/10.1007/978--3--642--37456- 2_14
[8]
Snigdha Chaturvedi, Vittorio Castelli, Radu Florian, Ramesh M. Nallapati, and Hema Raghavan. 2014. Joint Question Clustering and Relevance Prediction for Open Domain Non-Factoid Question Answering. In Proceedings of the 23rd International Conference on World Wide Web (WWW '14). Association for Computing Machinery, New York, NY, USA, 503--514. https://doi.org/10.1145/2566486. 2567999
[9]
Long Chen, Dell Zhang, and Levene Mark. 2012. Understanding User Intent in Community Question Answering. In Proceedings of the 21st International Conference on World Wide Web (WWW '12 Companion). Association for Computing Machinery, New York, NY, USA, 823--828. https://doi.org/10.1145/2187980.2188206
[10]
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 2924--2936. https://doi.org/10.18653/v1/n19--1300
[11]
Daniel Cohen and W. Bruce Croft. 2016. End to End Long Short Term Memory Networks for Non-Factoid Question Answering. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR '16). Association for Computing Machinery, New York, NY, USA, 143--146. https://doi.org/10.1145/2970398.2970438
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https://doi.org/10.18653/v1/n19--1423
[13]
Andrei Dulceanu, Thang Le Dinh, Walter Chang, Trung Bui, Doo Soon Kim, Manh Chien Vu, and Seokhwan Kim. 2018. PhotoshopQuiA: A Corpus of NonFactoid Questions and Answers for Why-Question Answering. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://www.aclweb.org/anthology/L18--1438
[14]
Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 3558--3567. https://doi.org/10. 18653/v1/P19--1346
[15]
J.L. Fleiss et al. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378--382.
[16]
Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew E. Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. AllenNLP: A Deep Semantic Natural Language Processing Platform. CoRR abs/1803.07640 (2018). arXiv:1803.07640 http://arxiv.org/abs/1803.07640
[17]
Arthur C. Graesser and Natalie K. Person. 1994. Question Asking During Tutoring. American Educational Research Journal 31, 1 (1994), 104--137. https://doi.org/10. 3102/00028312031001104 arXiv:https://doi.org/10.3102/00028312031001104
[18]
Deepak Gupta, Rajkumar Pujari, Asif Ekbal, Pushpak Bhattacharyya, Anutosh Maitra, Tom Jain, and Shubhashis Sengupta. 2018. Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 499--513. https://aclanthology.org/C18--1042
[19]
Ido Guy, Victor Makarenkov, Niva Hazon, Lior Rokach, and Bracha Shapira. 2018. Identifying Informational vs. Conversational Questions on Community Question Answering Archives. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). Association for Computing Machinery, New York, NY, USA, 216--224. https://doi.org/10.1145/3159652.3159733
[20]
Sanda M. Harabagiu, Dan I. Moldovan, Marius Pasca, Rada Mihalcea, Mihai Surdeanu, Razvan C. Bunescu, Roxana Girju, Vasile Rus, and Paul Morarescu. 2000. FALCON: Boosting Knowledge for Answer Engines. In Proceedings of The Ninth Text REtrieval Conference, TREC 2000, Gaithersburg, Maryland, USA, November 13--16, 2000 (NIST Special Publication), Ellen M. Voorhees and Donna K. Harman (Eds.), Vol. 500--249. National Institute of Standards and Technology (NIST). http://trec.nist.gov/pubs/trec9/papers/smu.pdf
[21]
Helia Hashemi, Hamed Zamani, and W. Bruce Croft. 2019. Performance Prediction for Non-Factoid Question Answering. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '19). Association for Computing Machinery, New York, NY, USA, 55--58. https: //doi.org/10.1145/3341981.3344249
[22]
Eduard Hovy, Ul Hermjakob, and Deep Ravichandran. 2002. A question/answer typology with surface text patterns. (01 2002). https://doi.org/10.3115/1289189. 1289206
[23]
David A. Hull. 1999. Xerox TREC-8 Question Answering Track Report. In Proceedings of The Eighth Text REtrieval Conference, TREC 1999, Gaithersburg, Maryland, USA, November 17--19, 1999 (NIST Special Publication), Ellen M. Voorhees and Donna K. Harman (Eds.), Vol. 500--246. National Institute of Standards and Technology (NIST). http://trec.nist.gov/pubs/trec8/papers/xerox-QA.pdf
[24]
Kalpesh Krishna, Aurko Roy, and Mohit Iyyer. 2021. Hurdles to Progress in Long-form Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 4940-- 4957. https://doi.org/10.18653/v1/2021.naacl-main.393
[25]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/forum?id=H1eA7AEtvS
[26]
Guang-He Lee and Yun-Nung Chen. 2017. MUSE: Modularizing Unsupervised Sense Embeddings. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 327--337. https://doi.org/10.18653/v1/D17--1034
[27]
Wendy G. Lehnert. 1977. A Conceptual Theory of Question Answering. In Proceedings of the 5th International Joint Conference on Artificial Intelligence. Cambridge, MA, USA, August 22--25, 1977, Raj Reddy (Ed.). William Kaufmann, 158--164.
[28]
Xin Li and Dan Roth. 2002. Learning Question Classifiers. In 19th International Conference on Computational Linguistics, COLING 2002, Howard International House and Academia Sinica, Taipei, Taiwan, August 24 - September 1, 2002. https: //aclanthology.org/C02--1150/
[29]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
[30]
Diganta Misra. 2020. Mish: A Self Regularized Non-Monotonic Activation Function. In 31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event, UK, September 7--10, 2020. BMVA Press. https://www.bmvc2020-conference.com/ assets/papers/0928.pdf
[31]
Junta Mizuno, Tomoyosi Akiba, Atsushi Fujii, and Katunobu Itou. 2007. Nonfactoid Question Answering Experiments at NTCIR-6: Towards Answer Type Detection for Realworld Questions. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-6, National Center of Sciences, Tokyo, Japan, May 15--18, 2007, Noriko Kando (Ed.). National Institute of Informatics (NII). http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings6/ NTCIR/71.pdf
[32]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016 (CEUR Workshop Proceedings), Tarek Richard Besold, Antoine Bordes, Artur S. d'Avila Garcez, and Greg Wayne (Eds.), Vol. 1773. CEUR-WS.org. http://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf
[33]
Eyal Peer, Joachim Vosgerau, and Alessandro Acquisti. 2014. Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. 46, 4 (2014), 1023--1031. https://doi.org/10.3758/s13428-013-0434-y
[34]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 784--789. https: //doi.org/10.18653/v1/P18--2124
[35]
Daniel E. Rose and Danny Levinson. 2004. Understanding User Goals in Web Search. In Proceedings of the 13th International Conference on World Wide Web (WWW '04). Association for Computing Machinery, New York, NY, USA, 13--19. https://doi.org/10.1145/988672.988675
[36]
Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, 410--420. https://www.aclweb.org/ anthology/D07--1043
[37]
Aurko Roy, Mohammad Saffar, Ashish Vaswani, and David Grangier. 2021. Efficient Content-Based Sparse Attention with Routing Transformers. Transactions of the Association for Computational Linguistics 9 (2021), 53--68. https: //doi.org/10.1162/tacl_a_00353
[38]
Amit Singhal, Steve Abney, Michiel Bacchiani, Michael Collins, Donald Hindle, and Fernando Pereira. 1999. ATT at TREC-8.
[39]
Amir Soleimani, Christof Monz, and Marcel Worring. 2021. NLQuAD: A NonFactoid Long Question Answering Data Set. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1245--1255. https: //doi.org/10.18653/v1/2021.eacl-main.106
[40]
Rohini Srihari and Wei Li. 2000. A Question Answering System Supported by Information Extraction. In Sixth Applied Natural Language Processing Conference. Association for Computational Linguistics, Seattle, Washington, USA, 166--172. https://doi.org/10.3115/974147.974170
[41]
Jun Suzuki, Hirotoshi Taira, Yutaka Sasaki, and Eisaku Maeda. 2003. Question Classification using HDAG Kernel. In Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering. Association for Computational Linguistics, Sapporo, Japan, 61--68. https://doi.org/10.3115/1119312. 1119320
[42]
Andrew Tawfik, Arthur Graesser, Jessica Gatewood, and Jaclyn Gishbaugher. 2020. Role of questions in inquiry-based instruction: towards a design taxonomy for question-asking and implications for design. Educational Technology Research and Development 68 (01 2020), 1--25. https://doi.org/10.1007/s11423-020-09738--9
[43]
Suzan Verberne, Lou Boves, Nelleke Oostdijk, and Peter-Arno Coppen. 2006. Data for question answering: The case of why. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06). European Language Resources Association (ELRA), Genoa, Italy. http://www.lrecconf.org/proceedings/lrec2006/pdf/525_pdf.pdf
[44]
Ellen M. Voorhees. 2001. Overview of the TREC 2001 Question Answering Track. In Proceedings of The Tenth Text REtrieval Conference, TREC 2001, Gaithersburg, Maryland, USA, November 13--16, 2001 (NIST Special Publication), Ellen M. Voorhees and Donna K. Harman (Eds.), Vol. 500--250. National Institute of Standards and Technology (NIST). http://trec.nist.gov/pubs/trec10/papers/qa10.pdf
[45]
Wenhan Xiong, Jiawei Wu, Hong Wang, Vivek Kulkarni, Mo Yu, Shiyu Chang, Xiaoxiao Guo, and William Yang Wang. 2019. TWEETQA: A Social Media Focused Question Answering Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 5020--5031. https://doi.org/10.18653/v1/P19--1496

Cited By

View all
  • (2024)Coherence-based Query Performance Measures for Dense RetrievalProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672518(15-24)Online publication date: 2-Aug-2024
  • (2024)Unveiling Information Through Narrative In Conversational Information SeekingProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665884(1-6)Online publication date: 8-Jul-2024
  • (2024)Automatic Large Language Model Evaluation via Peer ReviewProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679677(384-393)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataset analysis
  2. editorial study
  3. non-factoid question-answering
  4. question taxonomy

Qualifiers

  • Research-article

Funding Sources

  • Australian Research Council Project

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)293
  • Downloads (Last 6 weeks)34
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Coherence-based Query Performance Measures for Dense RetrievalProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672518(15-24)Online publication date: 2-Aug-2024
  • (2024)Unveiling Information Through Narrative In Conversational Information SeekingProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665884(1-6)Online publication date: 8-Jul-2024
  • (2024)Automatic Large Language Model Evaluation via Peer ReviewProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679677(384-393)Online publication date: 21-Oct-2024
  • (2024)ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657889(2049-2059)Online publication date: 10-Jul-2024
  • (2024)CardiO: Predicting Cardinality from Online SourcesCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3651477(573-576)Online publication date: 13-May-2024
  • (2024)Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA PerformanceIEEE Access10.1109/ACCESS.2024.3513155(1-1)Online publication date: 2024
  • (2023)An Intent Taxonomy of Legal Case RetrievalACM Transactions on Information Systems10.1145/362609342:2(1-27)Online publication date: 11-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media