Skip to main content

Stemming

  • Reference work entry
  • 218 Accesses

Synonyms

Suffix stripping; Suffixing; Affix removal; Word conflation

Definition

Stemming is a process by which word endings or other affixes are removed or modified in order that word forms which differ in non-relevant ways may be merged and treated as equivalent. A computer program which performs such a transformation is referred to as a stemmer or stemming algorithm. The output of a stemming algorithm is known as a stem.

Historical Background

The need for stemming first arose in the field of information retrieval (IR), where queries containing search terms need to be matched against document surrogates containing index terms. With the development of computer-based systems for IR, the problem immediately arose that a small difference in form between a search term and an index term could result in a failure to retrieve some relevant documents. Thus, if a query used the term “explosion” and a document was indexed by the term “explosives,” there would be no match on this term (whether or...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   2,500.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Adamson G.W. and Boreham J. The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Inf. Process. Manage., 10(7/8):253–260, 1974.

    Google Scholar 

  2. Ahmad F., Yusoff M., and Sembok M.T. Experiments with a stemming algorithm for Malay words. J. Am. Soc. Inf. Sci. Technol., 47(12):909–918, 1996.

    Google Scholar 

  3. Al-Sughaiyer I.A. and Al-Kharashi I.A. Arabic morphological analysis techniques: a comprehensive survey. J. Am. Soc. Inf. Sci. Technol., 55(3):189–213, 2004.

    Google Scholar 

  4. Aljlayl M. and Frieder O. On arabic search: Improving the retrieval effectiveness via a light stemming approach. In Proc. Int. Conf. on Information and Knowledge Management, 2002, pp. 340–347.

    Google Scholar 

  5. Bacchin M., Ferro N., and Melluci M. A probabilistic model for stemmer generation. Inf. Process. Manage., 41(1):121–137, 2005.

    Google Scholar 

  6. Frakes W.B. and Fox C.J. Strength and similarity of affix removal stemming algorithms. SIGIR Forum, 37(1):26–30, 2003 (Spring 2003).

    Google Scholar 

  7. Harman D. How effective is suffixing? J. Am. Soc. Inf. Sci., 42(1):7–15, 1991.

    Google Scholar 

  8. Hull D. A Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci., 47(1):70–84, 1996.

    Google Scholar 

  9. Krovetz R. Viewing morphology as an inference process. Artificial Intelligence, 118(1/2):277–294, 2000.

    MATH  Google Scholar 

  10. Lennon M., Pierce D.S., Tarry B.D., and Willett P. An evaluation of some conflation algorithms for information retrieval. J. Inf. Sci., 3:177–183, 1981.

    Google Scholar 

  11. Lovins J.B. Development of a stemming algorithm. Mech. Transl. Comput. Linguist., 11:22–31, 1968.

    Google Scholar 

  12. Paice C.D. Another stemmer. SIGIR Forum, 24(3):56–61, 1990.

    Google Scholar 

  13. Paice C.D. A method for the evaluation of stemming algorithms based on error counting. J. Am. Soc. Inf. Sci., 47(8):632–649, 1996.

    Google Scholar 

  14. Porter M.F. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.

    Google Scholar 

  15. Xu J. and Croft W.B. Corpus-based stemming using coocurrence of word variants. ACM Trans. Inf. Syst., 16(1):61–81, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this entry

Cite this entry

Paice, C.D. (2009). Stemming. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_942

Download citation

Publish with us

Policies and ethics