Abstract
We describe a principled method for representing documents by phrases abstracted into Head/Modifier pairs. First the notion of aboutness and the characterization of full-text documents by HM pairs is didcussed. Based on linguistic arguments, a taxonomy of HM pairs is derived. We briefly describe the EP4IR parser/transducer of English and present some statistics of the distribution of HM pairs in newspaper text.
Based on the HM pairs generated, a new technique to measure the accuracy of a parser is introduced, and applied to the EP4IR grammar of English. Finally we discuss the merits of HM pairs and HM trees as a document representation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arampatzis, A., van der Weide, T.P., Koster, C.H.A., van Bommel, P.: An Evaluation of Linguistically-motivated Indexing Schemes. In: Proceedings BCS-IRSG 2000 Colloquium on IR Research, Cambridge, England (2000)
Bruza, P., Huibers, T.W.C.: Investigating Aboutness Axioms using Information Fields. In: Proceedings SIGIR 1994, pp. 112–121 (1994)
Bruza, P., Huibers, T.W.C.: A Study of Aboutness in Information Retrieval. Artificial Intelligence Review 10, 1–27 (1996)
Bruza, P., van der Weide, T.P.: The Modelling and Retrieval of Documents Using Index Expressions. SIGIR Forum 25(2), 91–103 (1991)
Carroll, J., Guido, M., Briscoe, E.: Corpus Annotation for Parser Evaluation. In: Proceedings of the EACL workshop on Linguistically Interpreted Corpora (LINC) (1999)
Daelemans, W., Buchholz, S., Veenstra, J.: Memory-based shallow parsing. In: Proceedings CoNLL, Bergen, Norway(1999)
Evans, D.A., Lefferts, R.G., Grefenstette, G., Handerson, S.H., Hersch, W.R., Archbold, A.A.: CLARIT TREC design, experiments and results. In: TREC-1 proceedings, pp. 251–286 (1993)
Fagan, J.L.: Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods, PhD Thesis, Cornell University (1988)
Gelbukh, A., Sidorov, G., Han, S.-Y., Hernández-Rubio, E.: Automatic Syntactic Analysis for Detection of Word Combinations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 240–244. Springer, Heidelberg (2004)
Grefenstette, G.: Light parsing as finite state filtering. In: Workshop on Extended finite state models of language, ECAI 1996, Budapest (1996)
Koster, C.H.A.: Affix Grammars for Natural Languages. In: Alblas, H., Melichar, B. (eds.) SAGA School 1991. LNCS, vol. 545, pp. 469–484. Springer, Heidelberg (1991)
Koster, C.H.A., Seutter, M.: Taming Wild Phrases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 161–176. Springer, Heidelberg (2003)
Koster, C.H.A., Verbruggen, E.: The AGFL Grammar Work Lab. In: Proceedings of the FREENIX/Usenix conference 2002, pp. 13–18 (2002)
Krier, M., Zaccà, F.: Automatic Categorisation Applications at the European Patent Office. World Patent Information 24, 187–196 (2002)
Lewis, D.D.: Representation and Learning in Information Retrieval. PhD thesis, Department of Computer Science, Univ. of Massachusetts, Amherst, MA 01003 (1992)
Lin, D.: A dependency-based method for evaluating broad-coverage parsers. In: Proceedings IJCAI 1995, pp. 1420–1425 (1995)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1994)
Sparck Jones, K.: Information retrieval: how far will really simple methods take you? In: Proceedings TWTL 14, Twente University, the Netherlands, pp. 71–78 (1998)
Sparck Jones, K.: The role of NLP in Text Retrieval. In: [22], pp. 1–24 (1999)
Smeaton, A.F.: Using NLP and NLP resources for Information Retrieval Tasks. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1997)
Strzalkowski, T.: Natural Language Information Retrieval. Information Processing and Management 31(3), 397–417 (1995)
Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1999); ISBN 0-7923-5685-3
Winograd, T.: Language as a Cognitive Process. Syntax, vol. I, p. 650. Addison-Wesley, Reading (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koster, C.H.A. (2004). Head/Modifier Frames for Information Retrieval. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive