Abstract
This paper describes an approach for the integration of linguistic information in passage retrieval in an open-source question answering system for Dutch. Annotation produced by the wide-coverage dependency parser Alpino is stored in multiple index layers to be matched with natural language question that have been analyzed by the same parser. We present a genetic algorithm to select features to be included in retrieval queries and for optimizing keyword weights. The system is trained on questions annotated with their answers from the competition on Dutch question answering within the Cross-Language Evaluation Forum (CLEF). The optimization yielded a significant improvement of about 19% in mean reciprocal rank scores on unseen evaluation data compared to the base-line using traditional information retrieval with plain text keywords.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bernardi, R., Jijkoun, V., Mishne, G., de Rijke, M.: Selectively using linguistic resources throughout the question answering pipeline. In: Proceedings of the 2nd CoLogNET-ElsNET Symposium (2003)
Moldovan, D., Harabagiu, S., Girju, R., Morarescu, P., Lacatusu, F., Novischi, A., Badulescu, A., Bolohan, O.: LCC tools for question answering. In: Proceedings of TREC-11 (2002)
Strzalkowski, T., Guthrie, L., Karlgren, J., Leistensnider, J., Lin, F., Pérez-Carballo, J., Straszheim, T., Wang, J., Wilding, J.: Natural language information retrieval: TREC-5 report (1996)
Katz, B., Lin, J.: Selectively using relations to improve precision in question answering. In: Proceedings of the EACL 2003 Workshop on Natural Language Processing for Question Answering (2003)
Neumann, G., Sacaleanu, B.: Experiments on robust NL question interpretation and multi-layered document annotation for a cross-language question/answering system. In: Proceedings of the CLEF 2004 working notes of the QA@CLEF, Bath (2004)
Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–203 (1993)
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval (2003)
Vilares, J., Alonso, M.A., Vilare, M.: Morphological and syntactic processing for text retrieval. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 371–380. Springer, Heidelberg (2004)
Méndez, E., Vilares, J., Cabrero, D.: Cole at CLEF 2004: Rapid prototyping of a QA system for Spanish. In: Peters, C., Borri, F. (eds.) Results of the CLEF 2004 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2004 Workshop, pp. 413–418 (2004)
Kraaij, W., Pohlmann, R.: Comparing the effect of syntactic vs. statistical phrase indexing strategies for Dutch. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 605–614. Springer, Heidelberg (1998)
Monz, C.: From Document Retrieval to Question Answering. PhD thesis, University of Amsterdam (2003)
Pasca, M.: High-Performance Open-Domain Question Answering from Large Text Collections. PhD thesis, Southern Methodist University (2001)
Zhai, C.: Fast statistical parsing of noun phrases for document indexing. In: Proceedings of the fifth conference on Applied natural language processing, pp. 312–319. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Fagan, J.L.: Automatic phrase indexing for document retrieval. In: SIGIR 1987: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 91–101. ACM Press, New York (1987)
Alonso, M.A., Vilares, J., Darriba, V.M.: On the usefulness of extracting syntactic dependencies for text indexing. In: O’Neill, M., Sutcliffe, R.F.E., Ryan, C., Eaton, M., Griffith, N.J.L. (eds.) AICS 2002. LNCS (LNAI), vol. 2464, pp. 3–11. Springer, Heidelberg (2002)
Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. Data & Knowledge Engineering 42, 315–325 (2002)
Bouma, G., Mur, J., van Noord, G.: Reasoning over dependency relations for QA. In: Knowledge and Reasoning for Answering Questions (KRAQ 2005). IJCAI Workshop, Edinburgh, Scotland (2005)
Bouma, G., van Noord, G., Malouf, R.: Alpino: Wide coverage computational analysis of Dutch. In: Computational Linguistics in the Netherlands CLIN, 2000, Rodopi (2001)
Jijkoun, V., Mur, J., de Rijke, M.: Information extraction for question answering: Improving recall through syntactic patterns. In: Proceedings of COLING 2004 (2004)
Jakarta, A.: Apache Lucene - a high-performance, full-featured text search engine library (2004), http://lucene.apache.org/java/docs/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tiedemann, J. (2005). Improving Passage Retrieval in Question Answering Using NLP. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_62
Download citation
DOI: https://doi.org/10.1007/11595014_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30737-2
Online ISBN: 978-3-540-31646-6
eBook Packages: Computer ScienceComputer Science (R0)