Skip to main content

Enhanced Rules Application Order to Stem Affixation, Reduplication and Compounding Words in Malay Texts

  • Conference paper
  • First Online:
Knowledge Management and Acquisition for Intelligent Systems (PKAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9806))

Included in the following conference series:

Abstract

Word stemmer is an automated program to remove affixes, clitics and particles from derived words based on morphological structures of specific natural languages. It has been widely used for text preprocessing in many artificial intelligence applications. Furthermore, the performance of word stemmer to correctly stem derived words has an influence to the performance of information retrieval, text mining and text categorization applications. Despite of various stemming approaches were proposed in the past research, the existing word stemmers for Malay language still suffer from stemming errors. Moreover, the existing word stemmers partially consider morphological structures of Malay language in which only focused on affixation words instead of affixation, reduplication and compounding words, simultaneously. Therefore, this paper proposes an enhanced word stemmer using rule-based affixes removal and dictionary lookup methods called enhanced rule application order that is able to stem affixation, reduplication and compounding words and at the same time, is able to address possible stemming errors. This paper also examines possible root causes of affixation, reduplication and compounding stemming errors that could happen during word stemming process. The experimental results indicate that the proposed word stemmer is able to stem affixation, reduplication and compounding words with better stemming accuracy by using enhanced rule application order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdullah, M.T., Ahmad, F., Mahmod, R., Sembok, T.M.T.: Rules frequency order stemmer for Malay language. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(2), 433–438 (2009)

    Google Scholar 

  2. Ahmad, F., Yusoff, M., Sembok, T.M.: Experiments with a stemming algorithm for malay words. J. Am. Soc. Inform. Sci. 47(12), 909–918 (1996)

    Article  Google Scholar 

  3. Alfred, R., Leong, L.C., On, C.K., Anthony, P.: A literature review and discussion of malay rule-based affix elimination algorithms. In: 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity, pp. 285–297. Springer, Netherlands (2014)

    Google Scholar 

  4. Al-Ramahi, M., Mustafa, S.: N-Gram-Based Techniques for Arabic Text Document Matching, Case Study: Courses Accreditation (2012)

    Google Scholar 

  5. Al-Shalabi, R., Kannan, G., Hilat, I., Ababneh, A., Al-Zubi, A.: Experiments with the successor variety algorithm using the cutoff and entropy methods. Inf. Technol. J. 4(1), 55–62 (2005)

    Article  Google Scholar 

  6. Bakar, Z.A., Rahman, N.A.: Evaluating the effectiveness of thesaurus and stemming methods in retrieving malay translated Al-Quran documents. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, S.R., Myaeng, S.-H. (eds.) ICADL 2003. LNCS, vol. 2911, pp. 653–662. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Darwis, S.A., Abdullah, R., Idris, N.: Exhaustive affix stripping and a malay word register to solve stemming errors and ambiguity problem in malay stemmers. Malays. J. Comput. Sci. 25(4), 196–209 (2012)

    Google Scholar 

  8. Fadzli, S.A., Norsalehen, A.K., Syarilla, I.A., Hasni, H., Dhalila, M.S.S.: Simple rules malay stemmer. In: The International Conference on Informatics and Applications (ICIA 2012). The Society of Digital Information and Wireless Communication, pp. 28–35 (2012)

    Google Scholar 

  9. Hanum, H.M., Bakar, Z.A., Rahman, N.A., Rosli, M.M., Musa, N.: Using topic analysis for querying halal information on malay documents. Procedia Soc. Behav. Sci. 121, 214–222 (2014)

    Article  Google Scholar 

  10. Hassan, A.: Morfologi. PTS Professional, vol. 13 (2006)

    Google Scholar 

  11. Idris, N., Syed, S.M.F.D.: Stemming for term conflation in malay texts. In: International Conference on Artificial Intelligence (ICAI 2001) (2001)

    Google Scholar 

  12. Kassim, M.N., Maarof, M.A., Zainal, A.: Enhanced rules application order approach to stem reduplication words in malay texts. In: Herawan, T., Ghazali, R., Deris, M.M. (eds.) Recent Advances on Soft Computing and Data Mining SCDM 2014. AISC, vol. 287, pp. 657–665. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  13. Lee, J., Othman, R.M., Mohamad, N.Z.: Syllable-based Malay word stemmer. In: 2013 IEEE Symposium on Computers and Informatics (ISCI), pp. 7–11 (2013)

    Google Scholar 

  14. Leong, L.C., Basri, S., Alfred, R.: Enhancing malay stemming algorithm with background knowledge. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS, vol. 7458, pp. 753–758. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Lovins, J.B.: Development of a stemming algorithm, MIT Information Processing Group, Electronic Systems Laboratory (1968)

    Google Scholar 

  16. Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen, MSc Thesis, Universiti Kebangsaan Malaysia, Bangi, Malaysia (1993)

    Google Scholar 

  17. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  18. Ranaivo-Malancon, B.: Computational analysis of affixed words in malay language. In: Proceedings of the 8th International Symposium on Malay/Indonesian Linguistics, Penang, Malaysia (2004)

    Google Scholar 

  19. Sankupellay, M., Valliappan, S.: Malay language stemmer. Sunway Acad. J. 3, 147–153 (2006)

    Google Scholar 

  20. Sembok, T.M.T., Yussoff, M., Ahmad, F.: A malay stemming algorithm for information retrieval. In: Proceedings of the 4th International Conference and Exhibition on Multi-lingual Computing, vol. 5, pp. 1–2 (1994)

    Google Scholar 

  21. Sembok, T.M., Willett, P.: Experiments with n-gram string-similarity measure on Malay texts, Universiti Kebangsaan Malaysia. Bangi, Malaysia 22, 335–345 (1995)

    Google Scholar 

  22. Sembok, T.M.T., Bakar, Z.A.: Effectiveness of stemming and n-grams string similarity matching on malay documents. Int. J. Appl. Math. Inform. 5(3), 208–215 (2011)

    Google Scholar 

  23. Sharma, D.: Stemming algorithms: a comparative study and their analysis. Int. J. Appl. Inf. Syst. 4(3), 7–12 (2012)

    Google Scholar 

  24. Tai, S.Y., Ong, C.S., Abdullah, N.A.: On designing an automated malaysian stemmer for the malay language. In: Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, pp. 207–208. ACM (2000)

    Google Scholar 

  25. Yasukawa, M., Lim, H.T., Yokoo, H.: Stemming malay text and its application in automatic text categorization. IEICE Trans. Inf. Syst. 92(12), 2351–2359 (2009)

    Article  Google Scholar 

  26. Zainudin, M.K.A.B., Rias, R.M.: M-Hadith: retrieving malay hadith text in a mobile application. In: 2012 IEEE Symposium on Computer Applications and Industrial Electronics (ISCAIE), pp. 60–63 (2012)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the Editor-in-Chief and the anonymous reviewers of the manuscript for their valuable comments and suggestions. This research was funded by Universiti Teknologi Malaysia’s Research University Grant PY/2014/02479.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamad Nizam Kassim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kassim, M.N., Maarof, M.A., Zainal, A., Abdul Wahab, A. (2016). Enhanced Rules Application Order to Stem Affixation, Reduplication and Compounding Words in Malay Texts. In: Ohwada, H., Yoshida, K. (eds) Knowledge Management and Acquisition for Intelligent Systems . PKAW 2016. Lecture Notes in Computer Science(), vol 9806. Springer, Cham. https://doi.org/10.1007/978-3-319-42706-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42706-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42705-8

  • Online ISBN: 978-3-319-42706-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics