Skip to main content

Part of the book series: The Information Retrieval Series ((INRE,volume 22))

  • 842 Accesses

Abstract

This chapter describes two algorithms for probabilistic stemming. A probabilistic stemmer aims at detecting word stems by using a probabilistic or statistical model with no or very little knowledge about the language for which the stemmer has been built. While illustrating two probabilistic stemming models, a reflection and an analysis of the potentialities of this approach to stemming in the context of information retrieval are made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Information Processing and Management 41(1), 121–137 (2005). Elsevier

    Article  Google Scholar 

  2. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge, UK (2000)

    Google Scholar 

  3. Frakes, W.: Stemming algorithms. In: W. Frakes, R. Baeza-Yates (eds.) Information Retrieval: data structures and algorithms., chap. 8. Prentice Hall, Englewood Cliffs, NJ (1992)

    Google Scholar 

  4. Frakes, W., Baeza-Yates, R. (eds.): Information Retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ (1992)

    Google Scholar 

  5. Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 154–198 (2001)

    Article  MathSciNet  Google Scholar 

  6. Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10, 371–385 (1974)

    Article  Google Scholar 

  7. Harman, D.: How effective is suffixing. Journal of the American Society for Information Science 42(1), 7–15 (1991)

    Article  Google Scholar 

  8. Kleinberg, J.: Authorative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  9. Krovetz, R.: Viewing Morphology as an Inference Process,. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 1–203 (1993)

    Google Scholar 

  10. Lovins, J.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)

    Google Scholar 

  11. Melucci, M., Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation. Journal of the American Society for Information Science and Technology 58(5), 673–686 (2007)

    Article  Google Scholar 

  12. Paice, C.: Constructing literature abstract by computer: techniques and prospects. Information Processing & Management 26(1), 171–186 (1990)

    Article  Google Scholar 

  13. Popovic, M., Willett, P.: The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science 43(5), 384–390 (1992)

    Article  Google Scholar 

  14. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  15. Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, NJ (1993)

    Google Scholar 

  16. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 21–29. ACM Press, Zurich, Switzerland (1996)

    Google Scholar 

  17. Viterbi, A.: Error bounds for convolutional codes and an asymptotically decoding algorithm. IEEE Transactions on Knowledge and Data Engineering 13, 260–269 (1967)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Melucci, M., Orio, N. (2008). Two Algorithms for Probabilistic Stemming. In: Agosti, M. (eds) Information Access through Search Engines and Digital Libraries. The Information Retrieval Series, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75134-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75134-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75133-5

  • Online ISBN: 978-3-540-75134-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics