article

An exploration of the principles underlying redundancy-based factoid question answering

Author:

Jimmy LinAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 25, Issue 2

Pages 6 - es

https://doi.org/10.1145/1229179.1229180

Published: 01 April 2007 Publication History

Abstract

The so-called “redundancy-based” approach to question answering represents a successful strategy for mining answers to factoid questions such as “Who shot Abraham Lincoln?” from the World Wide Web. Through contrastive and ablation experiments with Aranea, a system that has performed well in several TREC QA evaluations, this work examines the underlying assumptions and principles behind redundancy-based techniques. Specifically, we develop two theses: that stable characteristics of data redundancy allow factoid systems to rely on external “black box” components, and that despite embodying a data-driven approach, redundancy-based methods encode a substantial amount of knowledge in the form of heuristics. Overall, this work attempts to address the broader question of “what really matters” and to provide guidance for future researchers.

References

[1]

Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (DL 2000). 85--94.

[2]

Agichtein, E., Lawrence, S., and Gravano, L. 2004. Learning to find answers to questions on the Web. ACM Trans. Int. Tech. 4, 2, 129--162.

[3]

Amigó, E., Gonzalo, J., Peinado, V., Peñas, A., and Verdejo, F. 2004. An empirical study of information synthesis task. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). 207--214.

[4]

Azari, D., Horvitz, E., Dumais, S., and Brill, E. 2004. Actions, answers, and uncertainty: A decision-making perspective on Web-based question answering. Inform. Process. Manage. 40, 5, 849--868.

[5]

Banko, M. and Brill, E. 2001. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001). 26--33.

[6]

Bar-Ilan, J. 2002. Methods for measuring search engine performance over time. J. Amer. Soc. Inform. Sci. Techn. 53, 4, 308--319.

[7]

Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: Statistical approaches to answering finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000). 192--199.

[8]

Brill, E. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computat. Ling. 21, 4, 543--565.

[9]

Brill, E., Dumais, S., and Banko, M. 2002. An analysis of the AskMSR question-answering system. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). 257--264.

[10]

Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the Tenth Text REtrieval Conference (TREC 2001). 393--400.

[11]

Brill, E. and Mooney, R. J. 1997. An overview of empirical natural language processing. AI Mag. 18, 4, 13--24.

[12]

Cafarella, M. J., Downey, D., Soderland, S., and Etzioni, O. 2005. KnowItAll: Fast, scalable information extraction from the Web. In Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). 563--570.

[13]

Cahn, S. M., Kitcher, P., Sher, G., and Markie, P. J. 1996. Reason at Work: Introductory Readings in Philosophy, 3rd ed. Hardcourt Brace College Publishers, Fort Worth, TX.

[14]

Chu-Carroll, J., Czuba, K., Prager, J., and Ittycheriah, A. 2003. In question answering, two heads are better than one. In Proceedings of the 2003 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2003). 24--31.

[15]

Church, K. W. and Mercer, R. L. 1993. Introduction to the special issue on computational linguistics using large corpora. Computat. Lingu. 19, 1, 1--24.

[16]

Clarke, C., Cormack, G., and Lynam, T. 2001a. Exploiting redundancy in question answering. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001). 375--383.

[17]

Clarke, C., Cormack, G., Lynam, T., Li, C., and McLearn, G. 2001b. Web reinforced question answering (MultiText experiments for TREC 2001). In Proceedings of the Tenth Text REtrieval Conference (TREC 2001). 673--679.

[18]

Cui, H., Sun, R., Li, K., Kan, M.-Y., and Chua, T.-S. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR 2005). 400--407.

[19]

Dang, H. 2005. Overview of DUC 2005. In Proceedings of the 2005 Document Understanding Conference (DUC 2005) at NLT/EMNLP 2005.

[20]

Dang, H., Lin, J., and Kelly, D. 2006. Overview of the TREC 2006 question answering track. In Proceedings of the Fifteenth Text REtrieval Conference (TREC 2006).

[21]

Dumais, S., Banko, M., Brill, E., Lin, J., and Ng, A. 2002. Web question answering: Is more always better&quest; In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002). 291--298.

[22]

Echihabi, A. and Marcu, D. 2003. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). 16--23.

[23]

Fleischman, M., Hovy, E., and Echihabi, A. 2003. Offline strategies for online question answering: Answering questions before they are asked. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). 1--7.

[24]

Fukumoto, J., Kato, T., and Masui, F. 2002. Question Answering Challenge (QAC-1): An evaluation of question answering task at NTCIR Workshop 3. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering.

[25]

Harabagiu, S., Moldovan, D., Paşca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Gîrju, R., Rus, V., and Morărescu, P. 2000a. FALCON: Boosting knowledge for answer engines. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). 497--506.

[26]

Harabagiu, S., Paşca, M., and Maiorano, S. 2000b. Experiments with open-domain textual question answering. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000). 292--298.

[27]

Hildebrandt, W., Katz, B., and Lin, J. 2004. Answering definition questions with multiple knowledge sources. In Proceedings of the 2004 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2004). 49--56.

[28]

Hirschman, L. and Gaizauskas, R. 2001. Natural language question answering: The view from here. Nat. Lang. Eng. 7, 4, 275--300.

[29]

Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). 655--664.

[30]

Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). 258--264.

[31]

Kato, T., Fukumoto, J., Masui, F., and Kando, N. 2004. Handling information access dialogue through QA technologies---a novel challenge for open-domain question answering. In Proceedings of the HLT-NAACL 2004 Workshop on Pragmatics of Question Answering. 70--77.

[32]

Katz, B. 1997. Annotating the World Wide Web using natural language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO 1997). 136--155.

[33]

Katz, B., Felshin, S., Yuret, D., Ibrahim, A., Lin, J., Marton, G., McFarland, A. J., and Temelkuran, B. 2002. Omnibase: Uniform access to heterogeneous data for question answering. In Proceedings of the 7th International Workshop on Applications of Natural Language to Information Systems (NLDB 2002). 230--234.

[34]

Kwok, C., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. ACM Trans. Inform. Syst. 19, 3, 242--262.

[35]

Light, M., Mann, G. S., Riloff, E., and Breck, E. 2001. Analyses for elucidating current question answering technology. Nat. Lang. Eng. 7, 4, 325--342.

[36]

Lin, J. 2005. Evaluation of resources for question answering evaluation. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005). 392--399.

[37]

Lin, J. and Demner-Fushman, D. 2005. Automatically evaluating answers to definition questions. In Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). 931--938.

[38]

Lin, J., Fernandes, A., Katz, B., Marton, G., and Tellex, S. 2002. Extracting answers from the Web using knowledge annotation and knowledge mining techniques. In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002).

[39]

Lin, J. and Katz, B. 2003. Question answering from the Web using knowledge annotation and knowledge mining techniques. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 2003). 116--123.

[40]

Lin, J. and Katz, B. 2006. Building a reusable test collection for question answering. J. Amer. Soc. Inform. Sci. Tech. 57, 7, 851--861.

[41]

Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., and Karger, D. R. 2003. What makes a good answer&quest; The role of context in question answering. In Proceedings of the Ninth IFIP TC13 International Conference on Human-Computer Interaction (INTERACT 2003). 25--32.

[42]

Lowe, J. B. 2000. What's in store for question answering&quest; (Invited talk.) In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000).

[43]

Magnini, B., Romagnoli, S., Vallin, A., Herrera, J., Peñas, A., Peinado, V., Verdejo, F., and de Rijke, M. 2004. The multiple language question answering track at CLEF 2003. In Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August 21--22, 2003, Revised Selected Papers, C. Peters, J. Gonzalo, M. Braschler, and M. Kluck, Eds. Lecture Notes in Computer Science, vol. 3237. Springer, Berlin, Germany, 471--486.

[44]

Mann, G. 2002. Learning how to answer question using trivia games. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002).

[45]

Mittendorf, E. and Schäuble, P. 1994. Document and passage retrieval based on Hidden Markov Models. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994). 318--327.

[46]

Moffat, A., Sacks-Davis, R., Wilkinson, R., and Zobel, J. 1993. Retrieval of partial documents. In Proceedings of the Second Text REtrieval Conference (TREC-2). 181--190.

[47]

Moldovan, D., Paşca, M., Harabagiu, S., and Surdeanu, M. 2002. Performance issues and error analysis in an open-domain question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002). 33--40.

[48]

Ntoulas, A., Cho, J., and Olston, C. 2004. What's new on the Web&quest; The evolution of the Web from a search engine perspective. In Proceedings of the 13th International World Wide Web Conference (WWW 2004). 1--12.

[49]

Prager, J., Brown, E., and Coden, A. 2000. Question-answering by predictive annotation. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000). 184--191.

[50]

Prager, J., Chu-Carroll, J., and Czuba, K. 2004. Question answering using constraint satisfaction: QA--by--Dossier--with--Constraints. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). 574--581.

[51]

Ravichandran, D. and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002). 41--47.

[52]

Robertson, S. 1977. The probability ranking principle in IR. J. Documentat. 33, 4, 294--304.

[53]

Robertson, S. 2004. Understanding inverse document frequency: On theoretical arguments for IDF. J. Documentat. 60, 5, 503--520.

[54]

Salton, G., Allan, J., and Buckley, C. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993). 49--58.

[55]

Srihari, R. and Li, W. 1999. Information extraction supported question answering. In Proceedings of the Eighth Text REtrieval Conference (TREC-8). 185--196.

[56]

Tellex, S., Katz, B., Lin, J., Marton, G., and Fernandes, A. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003). 41--47.

[57]

Voorhees, E. 2001. Overview of the TREC 2001 question answering track. In Proceedings of the Tenth Text REtrieval Conference (TREC 2001). 42--51.

[58]

Voorhees, E. 2002. Overview of the TREC 2002 question answering track. In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002). 57--68.

[59]

Voorhees, E. 2003. Overview of the TREC 2003 question answering track. In Proceedings of the Twelfth Text REtrieval Conference (TREC 2003). 54--68.

[60]

Voorhees, E. 2004. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004). 52--69.

[61]

Voorhees, E. and Tice, D. 1999. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text REtrieval Conference (TREC-8). 83--106.

[62]

Voorhees, E. and Tice, D. 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000). 200--207.

[63]

Zobel, J., Moffat, A., and Sacks-Davis, R. 1995. Efficient retrieval of partial documents. Inform. Process. Manage. 31, 3, 361--377.

Cited By

Barlybayev AMukanova A(2024)Advancements in Geospatial Question-Answering Systems: A Case Study on the Implementation in the Kazakh Language2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE)10.1109/PIERE62470.2024.10805012(1710-1715)Online publication date: 15-Nov-2024
https://doi.org/10.1109/PIERE62470.2024.10805012
Mukanova ABarlybayev ANazyrova AKussepova LMatkarimov BAbdikalyk G(2024)Development of a Geographical Question- Answering System in the Kazakh LanguageIEEE Access10.1109/ACCESS.2024.343342612(105460-105469)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3433426
Murdock VLee CHersh W(2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
https://doi.org/10.1007/978-3-031-73147-1_9
Show More Cited By

Index Terms

An exploration of the principles underlying redundancy-based factoid question answering
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Automatic question answering using the web: Beyond the Factoid
Abstract
In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be ...
Summarizing Answers in Non-Factoid Community Question-Answering
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

We aim at summarizing answers in community question-answering (CQA). While most previous work focuses on factoid question-answering, we focus on the non-factoid question-answering. Unlike factoid CQA, non-factoid question-answering usually requires ...
Full machine translation for factoid question answering
EACL 2012: Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

In this paper we present an SMT-based approach to Question Answering (QA). QA is the task of extracting exact answers in response to natural language questions. In our approach, the answer is a translation of the question obtained with an SMT system. We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 25, Issue 2

April 2007

141 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1229179

Issue’s Table of Contents

Copyright © 2007 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2007

Published in TOIS Volume 25, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
1,061
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Barlybayev AMukanova A(2024)Advancements in Geospatial Question-Answering Systems: A Case Study on the Implementation in the Kazakh Language2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE)10.1109/PIERE62470.2024.10805012(1710-1715)Online publication date: 15-Nov-2024
https://doi.org/10.1109/PIERE62470.2024.10805012
Mukanova ABarlybayev ANazyrova AKussepova LMatkarimov BAbdikalyk G(2024)Development of a Geographical Question- Answering System in the Kazakh LanguageIEEE Access10.1109/ACCESS.2024.343342612(105460-105469)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3433426
Murdock VLee CHersh W(2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
https://doi.org/10.1007/978-3-031-73147-1_9
Iovine ANarducci FMusto Cde Gemmis MSemeraro G(2023)Virtual Customer Assistants in finance: From state of the art and practices to design guidelinesComputer Science Review10.1016/j.cosrev.2023.10053447(100534)Online publication date: Feb-2023
https://doi.org/10.1016/j.cosrev.2023.100534
Mendonça VCoheur LSardinha A(2021)One Arm to Rule Them All: Online Learning with Multi-armed Bandits for Low-Resource Conversational AgentsProgress in Artificial Intelligence10.1007/978-3-030-86230-5_49(625-634)Online publication date: 3-Sep-2021
https://doi.org/10.1007/978-3-030-86230-5_49
Xue MYuan CWang JLiu W(2020)DPAEGSecurity and Communication Networks10.1155/2020/58908202020Online publication date: 13-Jan-2020
https://dl.acm.org/doi/10.1155/2020/5890820
Xu SE SXiang Y(2020)Enhanced Attentive Convolutional Neural Networks for Sentence Pair ModelingExpert Systems with Applications10.1016/j.eswa.2020.113384(113384)Online publication date: Mar-2020
https://doi.org/10.1016/j.eswa.2020.113384
Chowdhary KChowdhary K(2020)Natural Language ProcessingFundamentals of Artificial Intelligence10.1007/978-81-322-3972-7_19(603-649)Online publication date: 5-Apr-2020
https://doi.org/10.1007/978-81-322-3972-7_19
Fan YZhang LWang P(2020)Leveraging Label Semantics and Correlations for Judgment PredictionInformation Retrieval10.1007/978-3-030-56725-5_6(70-82)Online publication date: 14-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-56725-5_6
Otsuka ANishida KSaito IAsano HTomita JSatoh T(2019)Reading Comprehension based Question Answering technique by Focusing on Identifying Question Intention質問意図の明確化に着目した機械読解による質問応答手法の提案Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.A-J1434:5(A-J14_1-12)Online publication date: 1-Sep-2019
https://doi.org/10.1527/tjsai.A-J14
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents