Abstract
Information retrieval addresses the problem of finding those documents whose content matches a user’s request from among a large collection of documents. Currently, the most successful general purpose retrieval methods are statistical methods that treat text as little more than a bag of words. However, attempts to improve retrieval performance through more sophisticated linguistic processing have been largely unsuccessful. Indeed, unless done carefully, such processing can degrade retrieval effectiveness.
Several factors contribute to the dificulty of improving on a good statistical baseline including: the forgiving nature but broad coverage of the typical retrieval task; the lack of good weighting schemes for compound index terms; and the implicit linguistic processing inherent in the statistical methods. Natural language processing techniques may be more important for related tasks such as question answering or document summarization.
Keywords
- Information Retrieval
- Retrieval System
- Average Precision
- Retrieval Performance
- Word Sense Disambiguation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sparck Jones, K., Willett, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann, San Franciso (1997)
Salton, G. Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM. 18 (1975) 613–620
Sparck Jones, K.: Further Reflections on TREC. Information Processing and Management. (To appear.)
Sparck Jones, K.: What is the Role of NLP in Text Retrieval? In: Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer (In press.)
Perez-Carballo, J., Strzalkowski, T.: Natural Language Information Retrieval: Progress Report. Information Processing and Mangement. (To appear.)
D’Amore, R.J., Mah, C.P.: One-Time complete Indexing of Text: Theory and Practice. Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1985) 155–164
Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-Based Query Refinement. Information Processing and Management. (To appear.)
Strzalkowski, T.: NLP Track at TREC-5. Proceedings of the Fifth Text Retrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 97–101. Also at http://trec.nist.gov/pubs.html
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press (1998)
Voorhees, E.M.: Using WordNet to Disambiguate Word Senses for Text Retrieval. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1993) 171–180
Voorhees, E.M.: Using WordNet for Text Retrieval. In: Fellbaum, C. (ed.): Word-Net: An Electronic Lexical Database. MIT Press (1998) 285–303
Rau, L.F.: Conceptual Information Extraction and Retrieval from Natural Language Input. In: Sparck Jones, K., Willett, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann, San Franciso (1997) 527–533
Mauldin, M.L.: Retrieval Performance in FERRET. Proceedings of the Fourteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1991) 347–355
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. 41 (1990) 391–407
Fox, E.A.: Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types. Unpublished doctoral dissertation, Cornell University, Ithaca, NY. University Microfilms, Ann Arbor, MI.
Sanderson, M.: Word Sense Disambiguation and Information Retrieval. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag (1994) 142–151
Krovetz, R., Croft, W.B.: Lexical Ambiguity in Information Retrieval. ACM Transactions on Information Systems. 10 (1992) 115–141
Leacock, C., Towell, G., Voorhees, E.M.: Towards Building Contextual Representations of Word Senses Using Statistical Models. In: Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT Press (1996) 98–113
Paik, W., Liddy, E.D., Yu, E., Mckenna, M.: Categorizing and Standardizing Proper Nouns for Efficient Information Retrieval. In:Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT Press (1996) 61–73
Burger, J.D., Aberdeen, J.S., Palmer, D.D.: Information Retrieval and Trainable Natural Language Processing. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 433–435. Also at http://trec.nist.gov/pubs.html
Hull, D.A., Grefenstette, G., Schulze, B.M., Gaussier, E., Schütze, H., Pedersen, J.O.: Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 167–180. Also at http://trec.nist.gov/pubs.html
Zhai, C., Tong, X., Mili0107;-Frayling, N., Evans, D.A.: Evaluation of Syntactic Phrase Indexing—CLARIT NLP Track Report. Proceedings of the Fifth Text Retrieval Conference (TREC-5).NIST Special Publication 500-238 (1997), 347–357. Also at http://trec.nist.gov/pubs.html
Strzalkowski, T., Guthrie, L., Karlgren, J., Leistensnider, J., Lin, F., Perez-Carballo, J., Straszheim, T., Wang, J., Wilding, J.: Natural Language Information Retrieval: TREC-5 Report. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 291–313. Also at http://trec.nist.gov/pubs.html
Taghva, K., Borsack, J., Condit, A.: Results of Applying Probabilistic IR to OCR Text. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, (1994) 202–211
Kantor, P.B., Voorhees, E.M.: Report on the TREC-5 Confusion Track. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 65–74. Also at http://trec.nist.gov/pubs.html
Garofolo, J., Voorhees, E.M., Auzanne, C.G.P., Stanford, V.M., Lund, B.A.: 1998 TREC-7 Spoken Document Retrieval Track Overview and Results. Proceedings of the Seventh Text REtrieval Conference (TREC-7). (In press.) Also at http://trec.nist.gov/pubs.html
Buckley, C., Mitra M., Walz, J., Cardie, C.: Using Clustering and SuperConcepts Within SMART: TREC 6. Proceedings of the Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240 (1998), 107–124. Also at http://trec.nist.gov/pubs.html
Mani, I., House, D., Klein, G., Hirschman, L., Obrst, L., Firmin, T., Chrzanowski, M., Sundheim, B.: The TIPSTER SUMMAC Text Summarization Evaluation Final Report. MITRE Technical Report MTR 98W0000138. McLean, Virginia (1998). Also at http://www.nist.gov/itl/div894/894.02/related_projects/tipster_summac/final_rpt.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Voorhees, E.M. (1999). Natural Language Processing and Information Retrieval. In: Pazienza, M.T. (eds) Information Extraction. SCIE 1999. Lecture Notes in Computer Science(), vol 1714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48089-7_3
Download citation
DOI: https://doi.org/10.1007/3-540-48089-7_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66625-7
Online ISBN: 978-3-540-48089-1
eBook Packages: Springer Book Archive