Abstract
This chapter describes two algorithms for probabilistic stemming. A probabilistic stemmer aims at detecting word stems by using a probabilistic or statistical model with no or very little knowledge about the language for which the stemmer has been built. While illustrating two probabilistic stemming models, a reflection and an analysis of the potentialities of this approach to stemming in the context of information retrieval are made.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Information Processing and Management 41(1), 121–137 (2005). Elsevier
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge, UK (2000)
Frakes, W.: Stemming algorithms. In: W. Frakes, R. Baeza-Yates (eds.) Information Retrieval: data structures and algorithms., chap. 8. Prentice Hall, Englewood Cliffs, NJ (1992)
Frakes, W., Baeza-Yates, R. (eds.): Information Retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ (1992)
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 154–198 (2001)
Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10, 371–385 (1974)
Harman, D.: How effective is suffixing. Journal of the American Society for Information Science 42(1), 7–15 (1991)
Kleinberg, J.: Authorative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Krovetz, R.: Viewing Morphology as an Inference Process,. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 1–203 (1993)
Lovins, J.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Melucci, M., Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation. Journal of the American Society for Information Science and Technology 58(5), 673–686 (2007)
Paice, C.: Constructing literature abstract by computer: techniques and prospects. Information Processing & Management 26(1), 171–186 (1990)
Popovic, M., Willett, P.: The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science 43(5), 384–390 (1992)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, NJ (1993)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 21–29. ACM Press, Zurich, Switzerland (1996)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically decoding algorithm. IEEE Transactions on Knowledge and Data Engineering 13, 260–269 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Melucci, M., Orio, N. (2008). Two Algorithms for Probabilistic Stemming. In: Agosti, M. (eds) Information Access through Search Engines and Digital Libraries. The Information Retrieval Series, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75134-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-75134-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75133-5
Online ISBN: 978-3-540-75134-2
eBook Packages: Computer ScienceComputer Science (R0)