Two Algorithms for Probabilistic Stemming

Melucci, Massimo; Orio, Nicola

doi:10.1007/978-3-540-75134-2_4

Massimo Melucci² &
Nicola Orio²

Part of the book series: The Information Retrieval Series ((INRE,volume 22))

842 Accesses

Abstract

This chapter describes two algorithms for probabilistic stemming. A probabilistic stemmer aims at detecting word stems by using a probabilistic or statistical model with no or very little knowledge about the language for which the stemmer has been built. While illustrating two probabilistic stemming models, a reflection and an analysis of the potentialities of this approach to stemming in the context of information retrieval are made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Information Processing and Management 41(1), 121–137 (2005). Elsevier
Article Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge, UK (2000)
Google Scholar
Frakes, W.: Stemming algorithms. In: W. Frakes, R. Baeza-Yates (eds.) Information Retrieval: data structures and algorithms., chap. 8. Prentice Hall, Englewood Cliffs, NJ (1992)
Google Scholar
Frakes, W., Baeza-Yates, R. (eds.): Information Retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ (1992)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 154–198 (2001)
Article MathSciNet Google Scholar
Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10, 371–385 (1974)
Article Google Scholar
Harman, D.: How effective is suffixing. Journal of the American Society for Information Science 42(1), 7–15 (1991)
Article Google Scholar
Kleinberg, J.: Authorative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Krovetz, R.: Viewing Morphology as an Inference Process,. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 1–203 (1993)
Google Scholar
Lovins, J.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)
Google Scholar
Melucci, M., Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation. Journal of the American Society for Information Science and Technology 58(5), 673–686 (2007)
Article Google Scholar
Paice, C.: Constructing literature abstract by computer: techniques and prospects. Information Processing & Management 26(1), 171–186 (1990)
Article Google Scholar
Popovic, M., Willett, P.: The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science 43(5), 384–390 (1992)
Article Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, NJ (1993)
Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 21–29. ACM Press, Zurich, Switzerland (1996)
Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically decoding algorithm. IEEE Transactions on Knowledge and Data Engineering 13, 260–269 (1967)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131 Padova, Italy
Massimo Melucci & Nicola Orio

Authors

Massimo Melucci
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Orio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131 Padova, Italy
Maristella Agosti

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Melucci, M., Orio, N. (2008). Two Algorithms for Probabilistic Stemming. In: Agosti, M. (eds) Information Access through Search Engines and Digital Libraries. The Information Retrieval Series, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75134-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-75134-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75133-5
Online ISBN: 978-3-540-75134-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics