Abstract
In this paper, we present a Question Answering system based on redundancy and a Passage Retrieval method that is specifically oriented to Question Answering. We suppose that in a large enough document collection the answer to a given question may appear in several different forms. Therefore, it is possible to find one or more sentences that contain the answer and that also include tokens from the original question. The Passage Retrieval engine is almost language-independent since it is based on n-gram structures. Question classification and answer extraction modules are based on shallow patterns.
Similar content being viewed by others
Notes
The passage retrieval engine JIRS can be obtained at the following URL: http://sourceforge.net/projects/jirs/.
Note that the comma (,) is included in the position count.
References
Abney, S., Collins, M., & Singhal, A. (2000). Answer extraction. In Proceedings of the sixth conference on applied natural language processing, applied natural language conferences (pp. 296–301). Seattle, Washington: Morgan Kaufmann Publishers.
Aceves, R., Villaseñor, L., & Montes, M. (2005). Towards a multilingual QA system based on the web data redundancy. In AWIC, 2005 (pp. 32–37). Lodz, Poland.
Ahn, K., Alex, B., Bos, J., Dalmas, T., Leidner, J. L., & Smillie, M. B. (2005). Cross-lingual question answering using off-the-shelf machine translation. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 446–457). Springer.
Aunimo, L., Kuuskoski, R., & Makkonen, J. (2005). Finnish as source language in bilingual question answering. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 482–493). Springer.
Benajiba, Y., Rosso, P., & Gómez, J. M. (2007). Adapting JIRS passage retrieval system to the Arabic. In Proc. 8th int. conf. on comput. linguistics and intelligent text processing, CICLing-2007, LNCS (Vol. 4394, pp. 530–541). Springer.
Bilotti, M. W., Ogilvie, P., Callan, J., & Nyberg, E. (2007). Structured retrieval for question answering. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), 23–27 July 2007 (pp. 351–358). Amsterdam, The Netherlands: ACM.
Brill, E., Lin, J., Banko, M., Dumais, S. T., & Ng, A. Y. (2001). Data-intensive question answering. In Proceedings of the 10th text retrieval conference (TREC-10) (pp. 393–400). Gaithersburg, Maryland.
Buchholz, S. (2001). Using grammatical relations, answer frequencies and the World Wide Web for TREC question answering. In Proceedings of the 10th text retrieval conference (TREC-10) (pp. 502–506). Gaithersburg, Maryland.
Cao, J., Roussinov, D., Robles-Flores, J. A., & Nunamaker, J. F., Jr. (2005). Automated question answering from lecture videos: NLP vs. pattern matching. In Proceedings of the 38th Hawaii international conference on system sciences (HICSS 2005). Big Island, Hawaii, USA: IEEE Computer Society.
Clarke, C., Cormack, G., & Lynam, T. (2001). Exploiting redundancy in question answering. In 24th ACM SIGIR conference (pp. 358–365).
Del Castillo, A., Gómez, M. M., & Villaseñor-Pineda, L. (2004). QA on the web: A preliminary study for Spanish language. In Proceedings of the fifth Mexican international conference in computer science (ENC’04) (pp. 322–328). Colima, Mexico.
Giménez, J., & Márquez, L. (2004). SVMTool: A general POS Tagger generator based on support vector machines. In Proceedings of 4th LREC. Lisbon, Portugal.
Gómez, J. M., Buscaldi, D., Bisbal, E., Sanchis, E., & Rosso, P. (2005). A multilingual question answering system using an n-grams based passage retrieval. In Proc. workshop on natural language processing for information retrieval, 2nd Indian int. conf. on artificial intelligence (IICAI-2005) (pp. 686–672). Pune, India.
Gómez, J. M., Buscaldi, D., Rosso, P., & Sanchis, E. (2007a). JIRS Language-independent Passage Retrieval system: A comparative study. In Proc. 5th int. conf. on natural language processing (ICON-2007), 4–6 January. Hyderabad, India.
Gómez, J. M., Rosso, P., & Sanchis, E. (2007b). Re-ranking of Yahoo snippets with the JIRS Passage Retrieval system. In Proc. workshop on cross lingual information access (CLIA-2007), 20th int. joint conf. on artificial intelligence (IJCAI-07), 6–12 January 2007. Hyderabad, India.
Greenwood, M. A. (2004). Using pertainyms to improve passage retrieval for questions requesting information about a location. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004). Sheffield, UK.
Hacioglu, K., & Ward, W. (2003). Question classification with support vector machines and error correcting codes. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology: Companion volume of the proceedings of HLT-NAACL 2003–Short papers - Volume 2 (Edmonton, Canada, May 27– June 1, 2003) (pp. 28–30). North American Chapter Of The Association For Computational Linguistics. Association for Computational Linguistics, Morristown, NJ. doi:10.3115/1073483.1073493.
Hermjakob, U. (2001). Parsing and question classification for question answering. In Proceedings of the ACL 2001 workshop on open-domain question answering (pp. 17–22). Toulouse, France.
Hess, M. (1996). The 1996 international conference on tools with artificial intelligence (TAI 96). In Proc. conference on research and development in information retrieval (SIGIR 1996). Zürich, Switzerland.
Hovy, E., Gerber, L., Hermjakob, U., Junk, M., & Lin, C. (2000). Question answering in webclopedia. In Proceedings of the ninth text retrieval conference (TREC-9). Gaithersburg, Maryland.
Juárez, A., Téllez, A., Delicia, C., Montes, M., Villaseñor, L. (2007). Using machine learning and text mining in question answering. In 7th workshop of the cross-language evaluation forum (CLEF 2006), LNCS (Vol. 4730). Springer 2007.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics, Doklady, 10, 707–710.
Li, X., & Roth, D. (2002). Learning question classifiers. In Proc. international conference on computational linguistics (COLING 2002). Taipei, Taiwan.
Liu, X., & Croft, W. (2002). Passage retrieval based on language models. In Proceedings of the eleventh international conference on information and knowledge management (CIKM 02) (pp. 375–382). McLean, Virginia.
Llopis, F., & Vicedo, J. L. (2002). IR-n: A passage retrieval system at CLEF-2001. Revised papers from the second workshop of the cross-language evaluation forum on evaluation of cross-language information retrieval systems (September 03–04, 2001). In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (Eds.) Lecture notes in computer science (Vol. 2406, pp. 244–252). London: Springer.
Magnini, B., Negri, M., Prevete, R., & Tanev, H. (2001). Multilingual question/answering: The DIOGENE system. In Proceedings of the 10th text retrieval conference (TREC-10). Gaithersburg, Maryland.
Magnini, B., Vallin, S., Ayache, C., Erbach, G., Peñas, A., De Rijke, M., et al. (2005). Overview of the CLEF 2004 multilingual question answering track. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 371–391). Springer 2005.
Magnini, B., Giampiccolo, D., Forner, P., Ayache, C., Osenova, P., Peñas, A., et al. (2007). Overview of the CLEF 2006 multilingual question answering track. In Evaluation of multilingual and multi-modal information retrieval, LNCS (Vol. 4730, pp. 223–256). Springer.
Moldovan, D. I., Pasca, M., Harabagiu, S. M., & Surdeanu, M. (2003). Performance issues and error analysis in an open-domain question answering system. ACM Transactions on Information Systems, 21, 133–154. doi:10.1145/763693.763694.
Narayanan, S., & Harabagiu, S. (2004). Question answering based on semantic structures, international conference on computational linguistics (COLING 2004) (pp. 693–702). Geneva, Switzerland.
Neumann, G., & Sacaleanu, B. (2005). Experiments on robust nl question interpretation and multi-layered document annotation for a cross-language question/answering system. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 411–422). Springer 2005.
Pérez, M., Montes, M., López, A., & Villaseñor, L. (2006) The role of lexical features in question answering for Spanish. In Accessing multilingual information repositories: 6th workshop of the cross-language evaluation forum, CLEF 2005, LNCS (Vol. 4022). Revised Selected Papers. Springer 2006.
Roberts, I., & Gaizauskas, R. J. (2004). Evaluating passage retrieval approaches for question answering. In Advances in information retrieval, 26th European conference on IR research (ECIR 2004) (pp. 72–84). Sunderland, UK.
Robertson, E., Walker, S., & Beaulieu, M. (2000). Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36(1), 95–108. doi:10.1016/S0306-4573(99)00046-1.
Roussinov, D., Fan, W., & Robles-Flores, J. (2008). Beyond keywords: Automated question answering on the web. Communications of the ACM, 51(9), 60–65. doi:10.1145/1378727.1378743.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. doi:10.1016/0306-4573(88)90021-0.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the conference on new methods in language processing. Manchester, UK.
Vallin, S., Magnini, B., Giampiccolo, D., Aunimo, L., Ayache, C., Osenova, P., et al. (2006). Overview of the CLEF 2005 multilingual question answering track. In Accessing multilingual information repositories, LNCS (Vol. 4022, pp. 307–331). Springer 2006.
Vicedo, J. L., Izquierdo, R., Llopis, F., & Munoz, R. (2003). Question answering in Spanish. In Working notes of the Cross-Lingual Evaluation Forum (CLEF 2003). Trondheim, Norway.
Voorhees, E.M. (1999). The TREC-8 question answering track report. In Proceedings of the eighth text retrieval conference (TREC-8). Gaithersburg, Maryland.
Voorhees, E. M. (2000). Overview of the TREC-9 question answering track. In Proceedings of the ninth text retrieval conference (TREC-9). Gaithersburg, Maryland.
Voorhees, E. M. (2001) Overview of TREC 2001. In Proceedings of the tenth text retrieval conference (TREC-10). Gaithersburg, Maryland.
Acknowledgements
We would like to thank the TIN2006-15265-C06-04 research project for partially supporting this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Buscaldi, D., Rosso, P., Gómez-Soriano, J.M. et al. Answering questions with an n-gram based passage retrieval engine. J Intell Inf Syst 34, 113–134 (2010). https://doi.org/10.1007/s10844-009-0082-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-009-0082-y