Skip to main content
Log in

Stemming Hausa text: using affix-stripping rules and reference look-up

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

References

  • Ahmad, F., Yusoff, M., & Sembok, T. M. T. (1996). Experiments with a stemming algorithm for Malay words. Journal of the American Society for Information Science, 47(12), 909–918.

    Article  Google Scholar 

  • Alemayehu, N., & Willett, P. (2002). Stemming of Amharic Words for Information Retrieval. Literary and Linguistic Computing, 17(1)

  • Alhanini, Y., Juzaiddin, M., & Aziz, A. (2011). The enhancement of arabic stemming by using light stemming and dictionary-based stemming. Journal of Software Engineering and Applications, 4, 522–526.

    Article  Google Scholar 

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Newton: O’Reilly Media Inc.

    Google Scholar 

  • Darwis, S. A., Rukaini, A., & Idris, N. (2012). Exhaustive affix stripping and a Malay word register to solve stemming errors and ambiguity problem in Malay stemmers. Malaysian Journal of Computer Science, 25(4), 196–209.

    Google Scholar 

  • Dawson, J. (1974). Suffix removal and word conflation. Bulletin of the Association for Literary and Linguistic Computing, 2(3), 33–46.

    Google Scholar 

  • Frakes, W. B., & Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms (pp. 161–218). Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Frakes, W. B., & Fox, C. J. (2003). Strength and similarity of affix removal stemming algorithms. ACM SIGIR Forum, 37(1), 26–30.

    Article  Google Scholar 

  • Idris, N., & Mustapha S. M. F. D. (2001). Stemming for term conflation in Malay texts. In International conference on artificial intelligence (IC-AI. Las Vegas) (pp. 1512–1517).

  • Jaggar, P. J. (2001). Hausa. Reading: John Benjamins Publishing.

    Book  Google Scholar 

  • Jivani, A. G. (2011). A comparative study of stemming Algorithms. International Journal of Computer Technology and Applications, 2(6), 1930–1938.

    Google Scholar 

  • Khorsi, A. (2012). Effective unsupervised Arabic word stemming: Towards an unsupervised radicals extraction. The International Arab Journal of Information Technology, 9(6), 571–577.

    Google Scholar 

  • Kraaij, W., & Pohlmann, R. (1994). Porter’s stemming algorithm for Dutch. In L.G.M. Noordman & W.A.M. de Vroomen (Eds.), Informatiewetenschap 1994: Wetenschappelijke bijdragen aande derde STINFON conferentie (pp. 167–180). Leiden, Netherlands: Stichting Informatiewetenschap Nederland.

  • Kraaij, W., & Pohlmann, R. (1995). Evaluation of a Dutch stemming algorithm. The New Review of Document and Text Management, 1, 25–43.

    Google Scholar 

  • Kuhlman, D. (2012). A python book: Beginning python, advanced python and python exercises. Rexx.com.

  • Lewis, M. P. (2009). Ethnologue: Languages of the World, Sixteenth edition. [online]. http://www.ethnologue.com/. Accessed 4 Dec 2012

  • Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1&2), 22–31.

    Google Scholar 

  • Newman, P. (2000). The Hausa language: An encyclopedic reference grammar. New Haven: Yale University Press.

    Google Scholar 

  • Newman, P. (2007). A Hausa-English dictionary. New Haven: Yale University Press.

    Google Scholar 

  • Newman, R., & Newman, P. (2001). The Hausa lexicographic tradition. Lexikos, 11, 263–286.

    Google Scholar 

  • Paice, C. D. (1990). Another stemmer. ACM, SIGIR Forum, 24(3), 56–61.

    Article  Google Scholar 

  • Paice, C. D. (1994). An evaluation method for stemming algorithms. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 42–50). New York, NY: Springer-Verlag.

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.

    Article  Google Scholar 

  • Schuh, R. G. (2012). A Hausa story and Hausa verb morphology UCLA [online]. http://www.linguistics.ucla.edu/people/schuh/lx105/. Accessed 4 Dec 2012

  • Sever, H., & Bitirim, Y. (2003). FindStem: Analysis and evaluation of a Turkish stemming algorithm. LNCS, 2857, 238–251.

  • Sirsat, S. R., Chavan, V., & Mahalle, H. S. (2013). Strength and accuracy analysis of affix removal stemming algorithms. International Journal of Computer Science and Information Technologies, 4(2), 265–269.

    Google Scholar 

  • Smirnov, I. (2008). Overview of stemming algorithms. Mechanical Translation , 52.

  • Smirnova, M. (1982). The Hausa language a descriptive grammar. London: Routledge & Keagan Paul.

  • Solak, A., & Can, F. (1994). Effects of stemming on Turkish text retrieval. In Proceedings of the ninth international. Symposium on Computer and Information Sciences (ISCIS), pp. 49–56.

Download references

Acknowledgments

We gratefully acknowledge the support of Paul Newman, Indiana University USA, for the substantive comments and constructive criticisms given.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Bimba.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bimba, A., Idris, N., Khamis, N. et al. Stemming Hausa text: using affix-stripping rules and reference look-up. Lang Resources & Evaluation 50, 687–703 (2016). https://doi.org/10.1007/s10579-015-9311-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-015-9311-x

Keywords

Navigation