Skip to main content

A Search Engine for Morphologically Complex Languages

  • Conference paper
  • First Online:
Book cover Advances in Intelligent Data Analysis (IDA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2189))

Included in the following conference series:

  • 1126 Accesses

Abstract

Document retrieval on natural languages with a rich morphology — particularly in terms of derivation and (single-word) composition — suffers from serious performance degradation with the direct query-term-to-text-word matching paradigm that underlies the vast majority of current search engines. We propose an alternative approach in which morphologically complex word forms, which appear in the query as well as in the documents, are segmented into relevant subwords (such as stems, named entities, acronyms) and are subsequently submitted to the matching procedure. We evaluate our approach with the AltaVista™ Search Engine on a large medical document collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. H. Baud, C. Lovis, A.-M. Rassinoux, and J.-R. Scherrer. Morpho-semantic parsing of medical expressions. In AMIA’98-Proceedings of the 1998 AMIA Annual Fall Symposium, pages 760–764. Orlando, FL, November 7–11, 1998.

    Google Scholar 

  2. Y. Choueka. Responsa: An operational full-text retrieval system with linguistic components for large corpora. In A. Zampolli, editor, Computational Lexicology and Lexicography: A Volume in Honor of B. Quemada. Pisa: Giardini Press, 1992.

    Google Scholar 

  3. P. Dujols, P. Aubas, C. Baylon, and F. Grémy. Morphosemantic analysis and translation of medical compound terms. Methods of Information in Medicine, 30(1):30–35, 1991.

    Google Scholar 

  4. D. Harman. How effective is suffixing? Journal of the American Society for Information Science, 42(1):7–15, 1991.

    Article  Google Scholar 

  5. D. A. Hull. Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science, 47(1):70–84, 1996.

    Article  Google Scholar 

  6. H. Jäppinen and J. Niemistö. Inflections and compounds: Some linguistic problems for automatic indexing. In RIAO 88-Proceedings of the RIAO 88 Conference, volume 1, pages 333–342. Cambridge, MA, March 21–24, 1988.

    Google Scholar 

  7. W. Kraaij and R. Pohlmann. Viewing stemming as recall enhancement. In SIGIR’ 96-Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 40–48. Zurich, Switzerland, August 18–22, 1996.

    Google Scholar 

  8. R. Krovetz. Viewing morphology as an inference process. In SIGIR’93-Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 191–203. Pittsburgh, PA, USA, June 27–July 1, 1993.

    Google Scholar 

  9. J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1/2):22–31, 1968.

    Google Scholar 

  10. L. M. Norton and M. G. Pacak. Morphosemantic analysis of compound word forms denoting surgical procedures. Methods of Information in Medicine, 22(1):29–36, 1983.

    Google Scholar 

  11. M. G. Pacak, L. M. Norton, and G. S. Dunham. Morphosemantic analysis of-itis forms in medical language. Methods of Information in Medicine, 19(2):99–105, 1980.

    Google Scholar 

  12. M. Popovic and P. Willett. The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science, 43(5):384–390, 1992.

    Article  Google Scholar 

  13. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.

    Google Scholar 

  14. C. J. van Rijsbergen. Information Retrieval. London: Butterworths, 2nd edition, 1979.

    Google Scholar 

  15. E. Tzoukermann, J. L. Klavans, and C. Jacquemin. Effective use of natural language processing techniques for automatic conflation of multi-word terms: The role of derivational morphology, part of speech tagging, and shallow parsing. In SIGIR’97-Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 148–155. Philadelphia, PA, USA, July 27–31, 1997.

    Google Scholar 

  16. F. Wingert. Morphologic analysis of compound words. Methods of Information in Medicine, 24(3):155–162, 1985.

    Google Scholar 

  17. S. Wolff. The use of morphosemantic regularities in the medical vocabulary for automatic lexical coding. Methods of Information in Medicine, 23(4):195–203, 1984.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hahn, U., Honeck, M., Schulz, S. (2001). A Search Engine for Morphologically Complex Languages. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds) Advances in Intelligent Data Analysis. IDA 2001. Lecture Notes in Computer Science, vol 2189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44816-0_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-44816-0_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42581-6

  • Online ISBN: 978-3-540-44816-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics