Skip to main content

Head/Modifier Frames for Information Retrieval

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

We describe a principled method for representing documents by phrases abstracted into Head/Modifier pairs. First the notion of aboutness and the characterization of full-text documents by HM pairs is didcussed. Based on linguistic arguments, a taxonomy of HM pairs is derived. We briefly describe the EP4IR parser/transducer of English and present some statistics of the distribution of HM pairs in newspaper text.

Based on the HM pairs generated, a new technique to measure the accuracy of a parser is introduced, and applied to the EP4IR grammar of English. Finally we discuss the merits of HM pairs and HM trees as a document representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arampatzis, A., van der Weide, T.P., Koster, C.H.A., van Bommel, P.: An Evaluation of Linguistically-motivated Indexing Schemes. In: Proceedings BCS-IRSG 2000 Colloquium on IR Research, Cambridge, England (2000)

    Google Scholar 

  2. Bruza, P., Huibers, T.W.C.: Investigating Aboutness Axioms using Information Fields. In: Proceedings SIGIR 1994, pp. 112–121 (1994)

    Google Scholar 

  3. Bruza, P., Huibers, T.W.C.: A Study of Aboutness in Information Retrieval. Artificial Intelligence Review 10, 1–27 (1996)

    Article  Google Scholar 

  4. Bruza, P., van der Weide, T.P.: The Modelling and Retrieval of Documents Using Index Expressions. SIGIR Forum 25(2), 91–103 (1991)

    Article  Google Scholar 

  5. Carroll, J., Guido, M., Briscoe, E.: Corpus Annotation for Parser Evaluation. In: Proceedings of the EACL workshop on Linguistically Interpreted Corpora (LINC) (1999)

    Google Scholar 

  6. Daelemans, W., Buchholz, S., Veenstra, J.: Memory-based shallow parsing. In: Proceedings CoNLL, Bergen, Norway(1999)

    Google Scholar 

  7. Evans, D.A., Lefferts, R.G., Grefenstette, G., Handerson, S.H., Hersch, W.R., Archbold, A.A.: CLARIT TREC design, experiments and results. In: TREC-1 proceedings, pp. 251–286 (1993)

    Google Scholar 

  8. Fagan, J.L.: Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods, PhD Thesis, Cornell University (1988)

    Google Scholar 

  9. Gelbukh, A., Sidorov, G., Han, S.-Y., Hernández-Rubio, E.: Automatic Syntactic Analysis for Detection of Word Combinations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 240–244. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Grefenstette, G.: Light parsing as finite state filtering. In: Workshop on Extended finite state models of language, ECAI 1996, Budapest (1996)

    Google Scholar 

  11. Koster, C.H.A.: Affix Grammars for Natural Languages. In: Alblas, H., Melichar, B. (eds.) SAGA School 1991. LNCS, vol. 545, pp. 469–484. Springer, Heidelberg (1991)

    Google Scholar 

  12. Koster, C.H.A., Seutter, M.: Taming Wild Phrases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 161–176. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Koster, C.H.A., Verbruggen, E.: The AGFL Grammar Work Lab. In: Proceedings of the FREENIX/Usenix conference 2002, pp. 13–18 (2002)

    Google Scholar 

  14. Krier, M., Zaccà, F.: Automatic Categorisation Applications at the European Patent Office. World Patent Information 24, 187–196 (2002)

    Article  Google Scholar 

  15. Lewis, D.D.: Representation and Learning in Information Retrieval. PhD thesis, Department of Computer Science, Univ. of Massachusetts, Amherst, MA 01003 (1992)

    Google Scholar 

  16. Lin, D.: A dependency-based method for evaluating broad-coverage parsers. In: Proceedings IJCAI 1995, pp. 1420–1425 (1995)

    Google Scholar 

  17. Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1994)

    Google Scholar 

  18. Sparck Jones, K.: Information retrieval: how far will really simple methods take you? In: Proceedings TWTL 14, Twente University, the Netherlands, pp. 71–78 (1998)

    Google Scholar 

  19. Sparck Jones, K.: The role of NLP in Text Retrieval. In: [22], pp. 1–24 (1999)

    Google Scholar 

  20. Smeaton, A.F.: Using NLP and NLP resources for Information Retrieval Tasks. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  21. Strzalkowski, T.: Natural Language Information Retrieval. Information Processing and Management 31(3), 397–417 (1995)

    Article  Google Scholar 

  22. Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1999); ISBN 0-7923-5685-3

    MATH  Google Scholar 

  23. Winograd, T.: Language as a Cognitive Process. Syntax, vol. I, p. 650. Addison-Wesley, Reading (1983)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koster, C.H.A. (2004). Head/Modifier Frames for Information Retrieval. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics