Skip to main content
Log in

Answering questions with an n-gram based passage retrieval engine

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we present a Question Answering system based on redundancy and a Passage Retrieval method that is specifically oriented to Question Answering. We suppose that in a large enough document collection the answer to a given question may appear in several different forms. Therefore, it is possible to find one or more sentences that contain the answer and that also include tokens from the original question. The Passage Retrieval engine is almost language-independent since it is based on n-gram structures. Question classification and answer extraction modules are based on shallow patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.clef-campaign.org

  2. http://trec.nist.gov

  3. http://www.yahoo.com

  4. The passage retrieval engine JIRS can be obtained at the following URL: http://sourceforge.net/projects/jirs/.

  5. Note that the comma (,) is included in the position count.

References

  • Abney, S., Collins, M., & Singhal, A. (2000). Answer extraction. In Proceedings of the sixth conference on applied natural language processing, applied natural language conferences (pp. 296–301). Seattle, Washington: Morgan Kaufmann Publishers.

  • Aceves, R., Villaseñor, L., & Montes, M. (2005). Towards a multilingual QA system based on the web data redundancy. In AWIC, 2005 (pp. 32–37). Lodz, Poland.

    Google Scholar 

  • Ahn, K., Alex, B., Bos, J., Dalmas, T., Leidner, J. L., & Smillie, M. B. (2005). Cross-lingual question answering using off-the-shelf machine translation. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 446–457). Springer.

  • Aunimo, L., Kuuskoski, R., & Makkonen, J. (2005). Finnish as source language in bilingual question answering. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 482–493). Springer.

  • Benajiba, Y., Rosso, P., & Gómez, J. M. (2007). Adapting JIRS passage retrieval system to the Arabic. In Proc. 8th int. conf. on comput. linguistics and intelligent text processing, CICLing-2007, LNCS (Vol. 4394, pp. 530–541). Springer.

  • Bilotti, M. W., Ogilvie, P., Callan, J., & Nyberg, E. (2007). Structured retrieval for question answering. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), 23–27 July 2007 (pp. 351–358). Amsterdam, The Netherlands: ACM.

  • Brill, E., Lin, J., Banko, M., Dumais, S. T., & Ng, A. Y. (2001). Data-intensive question answering. In Proceedings of the 10th text retrieval conference (TREC-10) (pp. 393–400). Gaithersburg, Maryland.

  • Buchholz, S. (2001). Using grammatical relations, answer frequencies and the World Wide Web for TREC question answering. In Proceedings of the 10th text retrieval conference (TREC-10) (pp. 502–506). Gaithersburg, Maryland.

  • Cao, J., Roussinov, D., Robles-Flores, J. A., & Nunamaker, J. F., Jr. (2005). Automated question answering from lecture videos: NLP vs. pattern matching. In Proceedings of the 38th Hawaii international conference on system sciences (HICSS 2005). Big Island, Hawaii, USA: IEEE Computer Society.

  • Clarke, C., Cormack, G., & Lynam, T. (2001). Exploiting redundancy in question answering. In 24th ACM SIGIR conference (pp. 358–365).

  • Del Castillo, A., Gómez, M. M., & Villaseñor-Pineda, L. (2004). QA on the web: A preliminary study for Spanish language. In Proceedings of the fifth Mexican international conference in computer science (ENC’04) (pp. 322–328). Colima, Mexico.

  • Giménez, J., & Márquez, L. (2004). SVMTool: A general POS Tagger generator based on support vector machines. In Proceedings of 4th LREC. Lisbon, Portugal.

  • Gómez, J. M., Buscaldi, D., Bisbal, E., Sanchis, E., & Rosso, P. (2005). A multilingual question answering system using an n-grams based passage retrieval. In Proc. workshop on natural language processing for information retrieval, 2nd Indian int. conf. on artificial intelligence (IICAI-2005) (pp. 686–672). Pune, India.

  • Gómez, J. M., Buscaldi, D., Rosso, P., & Sanchis, E. (2007a). JIRS Language-independent Passage Retrieval system: A comparative study. In Proc. 5th int. conf. on natural language processing (ICON-2007), 4–6 January. Hyderabad, India.

  • Gómez, J. M., Rosso, P., & Sanchis, E. (2007b). Re-ranking of Yahoo snippets with the JIRS Passage Retrieval system. In Proc. workshop on cross lingual information access (CLIA-2007), 20th int. joint conf. on artificial intelligence (IJCAI-07), 6–12 January 2007. Hyderabad, India.

  • Greenwood, M. A. (2004). Using pertainyms to improve passage retrieval for questions requesting information about a location. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004). Sheffield, UK.

  • Hacioglu, K., & Ward, W. (2003). Question classification with support vector machines and error correcting codes. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology: Companion volume of the proceedings of HLT-NAACL 2003–Short papers - Volume 2 (Edmonton, Canada, May 27– June 1, 2003) (pp. 28–30). North American Chapter Of The Association For Computational Linguistics. Association for Computational Linguistics, Morristown, NJ. doi:10.3115/1073483.1073493.

  • Hermjakob, U. (2001). Parsing and question classification for question answering. In Proceedings of the ACL 2001 workshop on open-domain question answering (pp. 17–22). Toulouse, France.

  • Hess, M. (1996). The 1996 international conference on tools with artificial intelligence (TAI 96). In Proc. conference on research and development in information retrieval (SIGIR 1996). Zürich, Switzerland.

  • Hovy, E., Gerber, L., Hermjakob, U., Junk, M., & Lin, C. (2000). Question answering in webclopedia. In Proceedings of the ninth text retrieval conference (TREC-9). Gaithersburg, Maryland.

  • Juárez, A., Téllez, A., Delicia, C., Montes, M., Villaseñor, L. (2007). Using machine learning and text mining in question answering. In 7th workshop of the cross-language evaluation forum (CLEF 2006), LNCS (Vol. 4730). Springer 2007.

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics, Doklady, 10, 707–710.

    MathSciNet  Google Scholar 

  • Li, X., & Roth, D. (2002). Learning question classifiers. In Proc. international conference on computational linguistics (COLING 2002). Taipei, Taiwan.

  • Liu, X., & Croft, W. (2002). Passage retrieval based on language models. In Proceedings of the eleventh international conference on information and knowledge management (CIKM 02) (pp. 375–382). McLean, Virginia.

  • Llopis, F., & Vicedo, J. L. (2002). IR-n: A passage retrieval system at CLEF-2001. Revised papers from the second workshop of the cross-language evaluation forum on evaluation of cross-language information retrieval systems (September 03–04, 2001). In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (Eds.) Lecture notes in computer science (Vol. 2406, pp. 244–252). London: Springer.

  • Magnini, B., Negri, M., Prevete, R., & Tanev, H. (2001). Multilingual question/answering: The DIOGENE system. In Proceedings of the 10th text retrieval conference (TREC-10). Gaithersburg, Maryland.

  • Magnini, B., Vallin, S., Ayache, C., Erbach, G., Peñas, A., De Rijke, M., et al. (2005). Overview of the CLEF 2004 multilingual question answering track. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 371–391). Springer 2005.

  • Magnini, B., Giampiccolo, D., Forner, P., Ayache, C., Osenova, P., Peñas, A., et al. (2007). Overview of the CLEF 2006 multilingual question answering track. In Evaluation of multilingual and multi-modal information retrieval, LNCS (Vol. 4730, pp. 223–256). Springer.

  • Moldovan, D. I., Pasca, M., Harabagiu, S. M., & Surdeanu, M. (2003). Performance issues and error analysis in an open-domain question answering system. ACM Transactions on Information Systems, 21, 133–154. doi:10.1145/763693.763694.

    Article  Google Scholar 

  • Narayanan, S., & Harabagiu, S. (2004). Question answering based on semantic structures, international conference on computational linguistics (COLING 2004) (pp. 693–702). Geneva, Switzerland.

  • Neumann, G., & Sacaleanu, B. (2005). Experiments on robust nl question interpretation and multi-layered document annotation for a cross-language question/answering system. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 411–422). Springer 2005.

  • Pérez, M., Montes, M., López, A., & Villaseñor, L. (2006) The role of lexical features in question answering for Spanish. In Accessing multilingual information repositories: 6th workshop of the cross-language evaluation forum, CLEF 2005, LNCS (Vol. 4022). Revised Selected Papers. Springer 2006.

  • Roberts, I., & Gaizauskas, R. J. (2004). Evaluating passage retrieval approaches for question answering. In Advances in information retrieval, 26th European conference on IR research (ECIR 2004) (pp. 72–84). Sunderland, UK.

  • Robertson, E., Walker, S., & Beaulieu, M. (2000). Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36(1), 95–108. doi:10.1016/S0306-4573(99)00046-1.

    Article  Google Scholar 

  • Roussinov, D., Fan, W., & Robles-Flores, J. (2008). Beyond keywords: Automated question answering on the web. Communications of the ACM, 51(9), 60–65. doi:10.1145/1378727.1378743.

    Article  Google Scholar 

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. doi:10.1016/0306-4573(88)90021-0.

    Article  Google Scholar 

  • Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the conference on new methods in language processing. Manchester, UK.

  • Vallin, S., Magnini, B., Giampiccolo, D., Aunimo, L., Ayache, C., Osenova, P., et al. (2006). Overview of the CLEF 2005 multilingual question answering track. In Accessing multilingual information repositories, LNCS (Vol. 4022, pp. 307–331). Springer 2006.

  • Vicedo, J. L., Izquierdo, R., Llopis, F., & Munoz, R. (2003). Question answering in Spanish. In Working notes of the Cross-Lingual Evaluation Forum (CLEF 2003). Trondheim, Norway.

  • Voorhees, E.M. (1999). The TREC-8 question answering track report. In Proceedings of the eighth text retrieval conference (TREC-8). Gaithersburg, Maryland.

  • Voorhees, E. M. (2000). Overview of the TREC-9 question answering track. In Proceedings of the ninth text retrieval conference (TREC-9). Gaithersburg, Maryland.

  • Voorhees, E. M. (2001) Overview of TREC 2001. In Proceedings of the tenth text retrieval conference (TREC-10). Gaithersburg, Maryland.

Download references

Acknowledgements

We would like to thank the TIN2006-15265-C06-04 research project for partially supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Buscaldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buscaldi, D., Rosso, P., Gómez-Soriano, J.M. et al. Answering questions with an n-gram based passage retrieval engine. J Intell Inf Syst 34, 113–134 (2010). https://doi.org/10.1007/s10844-009-0082-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-009-0082-y

Keywords

Navigation