skip to main content
article

An exploration of the principles underlying redundancy-based factoid question answering

Published: 01 April 2007 Publication History

Abstract

The so-called “redundancy-based” approach to question answering represents a successful strategy for mining answers to factoid questions such as “Who shot Abraham Lincoln?” from the World Wide Web. Through contrastive and ablation experiments with Aranea, a system that has performed well in several TREC QA evaluations, this work examines the underlying assumptions and principles behind redundancy-based techniques. Specifically, we develop two theses: that stable characteristics of data redundancy allow factoid systems to rely on external “black box” components, and that despite embodying a data-driven approach, redundancy-based methods encode a substantial amount of knowledge in the form of heuristics. Overall, this work attempts to address the broader question of “what really matters” and to provide guidance for future researchers.

References

[1]
Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (DL 2000). 85--94.
[2]
Agichtein, E., Lawrence, S., and Gravano, L. 2004. Learning to find answers to questions on the Web. ACM Trans. Int. Tech. 4, 2, 129--162.
[3]
Amigó, E., Gonzalo, J., Peinado, V., Peñas, A., and Verdejo, F. 2004. An empirical study of information synthesis task. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). 207--214.
[4]
Azari, D., Horvitz, E., Dumais, S., and Brill, E. 2004. Actions, answers, and uncertainty: A decision-making perspective on Web-based question answering. Inform. Process. Manage. 40, 5, 849--868.
[5]
Banko, M. and Brill, E. 2001. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001). 26--33.
[6]
Bar-Ilan, J. 2002. Methods for measuring search engine performance over time. J. Amer. Soc. Inform. Sci. Techn. 53, 4, 308--319.
[7]
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: Statistical approaches to answering finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000). 192--199.
[8]
Brill, E. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computat. Ling. 21, 4, 543--565.
[9]
Brill, E., Dumais, S., and Banko, M. 2002. An analysis of the AskMSR question-answering system. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). 257--264.
[10]
Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the Tenth Text REtrieval Conference (TREC 2001). 393--400.
[11]
Brill, E. and Mooney, R. J. 1997. An overview of empirical natural language processing. AI Mag. 18, 4, 13--24.
[12]
Cafarella, M. J., Downey, D., Soderland, S., and Etzioni, O. 2005. KnowItAll: Fast, scalable information extraction from the Web. In Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). 563--570.
[13]
Cahn, S. M., Kitcher, P., Sher, G., and Markie, P. J. 1996. Reason at Work: Introductory Readings in Philosophy, 3rd ed. Hardcourt Brace College Publishers, Fort Worth, TX.
[14]
Chu-Carroll, J., Czuba, K., Prager, J., and Ittycheriah, A. 2003. In question answering, two heads are better than one. In Proceedings of the 2003 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2003). 24--31.
[15]
Church, K. W. and Mercer, R. L. 1993. Introduction to the special issue on computational linguistics using large corpora. Computat. Lingu. 19, 1, 1--24.
[16]
Clarke, C., Cormack, G., and Lynam, T. 2001a. Exploiting redundancy in question answering. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001). 375--383.
[17]
Clarke, C., Cormack, G., Lynam, T., Li, C., and McLearn, G. 2001b. Web reinforced question answering (MultiText experiments for TREC 2001). In Proceedings of the Tenth Text REtrieval Conference (TREC 2001). 673--679.
[18]
Cui, H., Sun, R., Li, K., Kan, M.-Y., and Chua, T.-S. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR 2005). 400--407.
[19]
Dang, H. 2005. Overview of DUC 2005. In Proceedings of the 2005 Document Understanding Conference (DUC 2005) at NLT/EMNLP 2005.
[20]
Dang, H., Lin, J., and Kelly, D. 2006. Overview of the TREC 2006 question answering track. In Proceedings of the Fifteenth Text REtrieval Conference (TREC 2006).
[21]
Dumais, S., Banko, M., Brill, E., Lin, J., and Ng, A. 2002. Web question answering: Is more always better? In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002). 291--298.
[22]
Echihabi, A. and Marcu, D. 2003. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). 16--23.
[23]
Fleischman, M., Hovy, E., and Echihabi, A. 2003. Offline strategies for online question answering: Answering questions before they are asked. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). 1--7.
[24]
Fukumoto, J., Kato, T., and Masui, F. 2002. Question Answering Challenge (QAC-1): An evaluation of question answering task at NTCIR Workshop 3. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering.
[25]
Harabagiu, S., Moldovan, D., Paşca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Gîrju, R., Rus, V., and Morărescu, P. 2000a. FALCON: Boosting knowledge for answer engines. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). 497--506.
[26]
Harabagiu, S., Paşca, M., and Maiorano, S. 2000b. Experiments with open-domain textual question answering. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000). 292--298.
[27]
Hildebrandt, W., Katz, B., and Lin, J. 2004. Answering definition questions with multiple knowledge sources. In Proceedings of the 2004 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2004). 49--56.
[28]
Hirschman, L. and Gaizauskas, R. 2001. Natural language question answering: The view from here. Nat. Lang. Eng. 7, 4, 275--300.
[29]
Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). 655--664.
[30]
Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). 258--264.
[31]
Kato, T., Fukumoto, J., Masui, F., and Kando, N. 2004. Handling information access dialogue through QA technologies---a novel challenge for open-domain question answering. In Proceedings of the HLT-NAACL 2004 Workshop on Pragmatics of Question Answering. 70--77.
[32]
Katz, B. 1997. Annotating the World Wide Web using natural language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO 1997). 136--155.
[33]
Katz, B., Felshin, S., Yuret, D., Ibrahim, A., Lin, J., Marton, G., McFarland, A. J., and Temelkuran, B. 2002. Omnibase: Uniform access to heterogeneous data for question answering. In Proceedings of the 7th International Workshop on Applications of Natural Language to Information Systems (NLDB 2002). 230--234.
[34]
Kwok, C., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. ACM Trans. Inform. Syst. 19, 3, 242--262.
[35]
Light, M., Mann, G. S., Riloff, E., and Breck, E. 2001. Analyses for elucidating current question answering technology. Nat. Lang. Eng. 7, 4, 325--342.
[36]
Lin, J. 2005. Evaluation of resources for question answering evaluation. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005). 392--399.
[37]
Lin, J. and Demner-Fushman, D. 2005. Automatically evaluating answers to definition questions. In Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). 931--938.
[38]
Lin, J., Fernandes, A., Katz, B., Marton, G., and Tellex, S. 2002. Extracting answers from the Web using knowledge annotation and knowledge mining techniques. In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002).
[39]
Lin, J. and Katz, B. 2003. Question answering from the Web using knowledge annotation and knowledge mining techniques. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 2003). 116--123.
[40]
Lin, J. and Katz, B. 2006. Building a reusable test collection for question answering. J. Amer. Soc. Inform. Sci. Tech. 57, 7, 851--861.
[41]
Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., and Karger, D. R. 2003. What makes a good answer? The role of context in question answering. In Proceedings of the Ninth IFIP TC13 International Conference on Human-Computer Interaction (INTERACT 2003). 25--32.
[42]
Lowe, J. B. 2000. What's in store for question answering? (Invited talk.) In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000).
[43]
Magnini, B., Romagnoli, S., Vallin, A., Herrera, J., Peñas, A., Peinado, V., Verdejo, F., and de Rijke, M. 2004. The multiple language question answering track at CLEF 2003. In Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August 21--22, 2003, Revised Selected Papers, C. Peters, J. Gonzalo, M. Braschler, and M. Kluck, Eds. Lecture Notes in Computer Science, vol. 3237. Springer, Berlin, Germany, 471--486.
[44]
Mann, G. 2002. Learning how to answer question using trivia games. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002).
[45]
Mittendorf, E. and Schäuble, P. 1994. Document and passage retrieval based on Hidden Markov Models. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994). 318--327.
[46]
Moffat, A., Sacks-Davis, R., Wilkinson, R., and Zobel, J. 1993. Retrieval of partial documents. In Proceedings of the Second Text REtrieval Conference (TREC-2). 181--190.
[47]
Moldovan, D., Paşca, M., Harabagiu, S., and Surdeanu, M. 2002. Performance issues and error analysis in an open-domain question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002). 33--40.
[48]
Ntoulas, A., Cho, J., and Olston, C. 2004. What's new on the Web? The evolution of the Web from a search engine perspective. In Proceedings of the 13th International World Wide Web Conference (WWW 2004). 1--12.
[49]
Prager, J., Brown, E., and Coden, A. 2000. Question-answering by predictive annotation. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000). 184--191.
[50]
Prager, J., Chu-Carroll, J., and Czuba, K. 2004. Question answering using constraint satisfaction: QA--by--Dossier--with--Constraints. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). 574--581.
[51]
Ravichandran, D. and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002). 41--47.
[52]
Robertson, S. 1977. The probability ranking principle in IR. J. Documentat. 33, 4, 294--304.
[53]
Robertson, S. 2004. Understanding inverse document frequency: On theoretical arguments for IDF. J. Documentat. 60, 5, 503--520.
[54]
Salton, G., Allan, J., and Buckley, C. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993). 49--58.
[55]
Srihari, R. and Li, W. 1999. Information extraction supported question answering. In Proceedings of the Eighth Text REtrieval Conference (TREC-8). 185--196.
[56]
Tellex, S., Katz, B., Lin, J., Marton, G., and Fernandes, A. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003). 41--47.
[57]
Voorhees, E. 2001. Overview of the TREC 2001 question answering track. In Proceedings of the Tenth Text REtrieval Conference (TREC 2001). 42--51.
[58]
Voorhees, E. 2002. Overview of the TREC 2002 question answering track. In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002). 57--68.
[59]
Voorhees, E. 2003. Overview of the TREC 2003 question answering track. In Proceedings of the Twelfth Text REtrieval Conference (TREC 2003). 54--68.
[60]
Voorhees, E. 2004. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004). 52--69.
[61]
Voorhees, E. and Tice, D. 1999. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text REtrieval Conference (TREC-8). 83--106.
[62]
Voorhees, E. and Tice, D. 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000). 200--207.
[63]
Zobel, J., Moffat, A., and Sacks-Davis, R. 1995. Efficient retrieval of partial documents. Inform. Process. Manage. 31, 3, 361--377.

Cited By

View all
  • (2024)Advancements in Geospatial Question-Answering Systems: A Case Study on the Implementation in the Kazakh Language2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE)10.1109/PIERE62470.2024.10805012(1710-1715)Online publication date: 15-Nov-2024
  • (2024)Development of a Geographical Question- Answering System in the Kazakh LanguageIEEE Access10.1109/ACCESS.2024.343342612(105460-105469)Online publication date: 2024
  • (2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
  • Show More Cited By

Index Terms

  1. An exploration of the principles underlying redundancy-based factoid question answering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 25, Issue 2
      April 2007
      141 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/1229179
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 April 2007
      Published in TOIS Volume 25, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Data redundancy
      2. Web search

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Advancements in Geospatial Question-Answering Systems: A Case Study on the Implementation in the Kazakh Language2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE)10.1109/PIERE62470.2024.10805012(1710-1715)Online publication date: 15-Nov-2024
      • (2024)Development of a Geographical Question- Answering System in the Kazakh LanguageIEEE Access10.1109/ACCESS.2024.343342612(105460-105469)Online publication date: 2024
      • (2024)Designing for the Future of Information Access with Generative Information RetrievalInformation Access in the Era of Generative AI10.1007/978-3-031-73147-1_9(223-248)Online publication date: 12-Sep-2024
      • (2023)Virtual Customer Assistants in finance: From state of the art and practices to design guidelinesComputer Science Review10.1016/j.cosrev.2023.10053447(100534)Online publication date: Feb-2023
      • (2021)One Arm to Rule Them All: Online Learning with Multi-armed Bandits for Low-Resource Conversational AgentsProgress in Artificial Intelligence10.1007/978-3-030-86230-5_49(625-634)Online publication date: 3-Sep-2021
      • (2020)DPAEGSecurity and Communication Networks10.1155/2020/58908202020Online publication date: 13-Jan-2020
      • (2020)Enhanced Attentive Convolutional Neural Networks for Sentence Pair ModelingExpert Systems with Applications10.1016/j.eswa.2020.113384(113384)Online publication date: Mar-2020
      • (2020)Natural Language ProcessingFundamentals of Artificial Intelligence10.1007/978-81-322-3972-7_19(603-649)Online publication date: 5-Apr-2020
      • (2020)Leveraging Label Semantics and Correlations for Judgment PredictionInformation Retrieval10.1007/978-3-030-56725-5_6(70-82)Online publication date: 14-Aug-2020
      • (2019)Reading Comprehension based Question Answering technique by Focusing on Identifying Question Intention質問意図の明確化に着目した機械読解による質問応答手法の提案Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.A-J1434:5(A-J14_1-12)Online publication date: 1-Sep-2019
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media