Abstract
The paper presents several techniques for selecting noun phrases for interactive query expansion following pseudo-relevance feedback and a new phrase search method. A combined syntactico-statistical method was used for the selection of phrases. First, noun phrases were selected using a part-of-speech tagger and a noun-phrase chunker, and secondly, different statistical measures were applied to select phrases for query expansion. Experiments were also conducted studying the effectiveness of noun phrases in document ranking. We analyse the problems of phrase weighting and suggest new ways of addressing them. A new method of phrase matching and weighting was developed, which specifically addresses the problem of weighting overlapping and non-contiguous word sequences in documents.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Robertson, S.E., Spärck Jones, K.: Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27, 129–146 (1976)
Spärck Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36(6), 779–808 (Part 1); 809–840 (Part 2) (2000)
Salton, G., Wong, A., Yang, C.S.: A vector space model for information retrieval. Communications of the ACM 18(11), 613–620 (1975)
Voorhees, E., Buckland, L. (eds.): Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD (2004)
Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of the 19th International Conference on Research and Development in Information Retrieval (SIGIR 1996), Zurich, Switzerland, pp. 4–11 (1996)
Frantzi, K.T., Ananiadou, S.: Extracting nested collocations. In: Proceedings of the 16th Conference on Computational Linguistics, COLING, pp. 41–46 (1996)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Bely, N., Borillo, A., Virbel, J., Siot-Decauville, N.: Procédures d’analyse sémantique appliquée à la documentation scientifique. Paris: Gauthier (1970)
Fagan, J.L.: Automatic Phrase Indexing For Document Retrieval: An Examination Of Syntactic And Non-Syntactic Methods. In: Proceedings of the Tenth ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, pp. 91–108 (1987)
Fagan, J.L.: The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science 40(2), 115–132 (1989)
Salton, G., Lesk, M.E.: Computer Evaluation of Indexing and Text Processing. Journal of the ACM (JACM) 15(1), 8–36 (1968)
Strzalkowski, T., Perez-Carballo, J.: Evaluating natural language processing techniques in information retrieval. In: Strzalkowski, T. (ed.) Natural language information retrieval, pp. 113–145. Kluwer Academic Publishers, Dordrecht (1999)
Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An Analysis of Statistical And Syntactic Phrases. In: Proceedings of RIAO 1997, Computer-Assisted Information Searching on the Internet, Montreal, Canada, pp. 200–214 (1997)
Robertson, S.E., Zaragoza, H., Taylor, M.: Microsoft Cambridge at TREC-12: HARD track. In: Voorhees, E., Buckland, L. (eds.) Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD, pp. 418–425 (2004)
Marchionini, G.: Interfaces for End-User Information Seeking. Journal of the ASIS 43(2), 156–163 (1992)
Smeaton, A.F., Kelledy, F.: User-Chosen Phrases in Interactive Query Formulation for Information Retrieval. In: Proceedings of the 20th BCS-IRSG Colloquium, Grenoble, France. Workshops in Computing. Springer, Heidelberg (1998)
Vechtomova, O., Karamuftuoglu, M., Lam, E.: Interactive Search Refinement Techniques for HARD Tasks. In: Voorhees, E., Buckland, L. (eds.) Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD, pp. 820–827 (2004)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Ramshaw, L., Marcus, M.: Text Chunking Using Transformation-Based Learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora. MIT, Cambridge (1995)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
Banerjee, S., Pedersen, T.: The Design, Implementation and Use of the Ngram Statistics Package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City (2003)
Clarke, C.L.A., Cormack, G.V.: On the use of Regular Expressions for Searching Text. University of Waterloo Computer Science Department Technical Report number CS-1995-2007, University of Waterloo, Canada (1995)
Allan, J.: HARD Track Overview in TREC 2003 High Accuracy Retrieval from Documents. In: Voorhees, E., Buckland, L. (eds.) Proceedings of the Twelfth Text Retrieval Conference, NIST, Gaithersburg, MD, pp. 24–37 (2004)
Beaulieu, M., Jones, S.: Interactive searching and interface issues in the Okapi best match probabilistic retrieval system. Interacting with Computers 10(3), 237–248 (1998)
Ruthven, I.: Re-examining the potential effectiveness of interactive query expansion. In: Proceedings of the 26th ACM-SIGIR conference, Toronto, Canada, pp. 213–220 (2003)
Vintar, Š.: Comparative Evaluation of C-Value in the Treatment of Nested Terms. In: Proceedings of MEMURA 2004 Workshop (Methodologies and Evaluation of Multiword Units in Real-world Applications), Language Resources and Evaluation Conference (LREC), Lisbon, Portugal, pp. 54–57 (2004)
Vechtomova, O., Karamuftuoglu, M., Skomorowski, J.: Approaches to High Accuracy Document Retrieval in HARD Track. In: Voorhees, E., Buckland, L. (eds.) To appear in Proceedings of the Thirteenth Text Retrieval Conference, NIST, Gaithersburg, MD (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vechtomova, O. (2005). The Role of Multi-word Units in Interactive Information Retrieval. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)