Skip to main content

Evaluating the Effectiveness of Thesaurus and Stemming Methods in Retrieving Malay Translated Al-Quran Documents

  • Conference paper
Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access (ICADL 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2911))

Included in the following conference series:

Abstract

Information Technology has enabled information in many forms such as text, image or sound, to be accessed widely using search terms via a computer. Due to this type of popularity and advancement in technology, there is an increase interest in searching Malay text to enable scholars and researchers to access databases on-line. Malay texts are scanned are stored in databases ready to be used for text retrieval systems that employ conflation methods to identify word variants from these databases. This paper evaluates the retrieval effectiveness of conflation methods; namely stemming and thesaurus to search and retrieve relevant Malay translated Al-Quran documents based on user natural query words. The Malay Translated Al-Quran texts are stored in an inverted file structure. The retrieved documents are weighted and ranked using Inverse Document Frequency (idf) function. The retrieval effectiveness (E) is measured using standard recall (R) and precision (P). Experiments performed on the Malay Translated Al-Quran documents show that combined search of stemming and thesaurus improve retrieval effectiveness (E) and recall (R) but decrease its precision (P).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altar, S.: Information systems: a management perspective, 2nd edn. The Benjamin/Cummings Publishing, Inc., Menlo Park (1996)

    Google Scholar 

  2. van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    Google Scholar 

  3. Ekmekcioglu, F.C., Lynch, M.F., Robertson, A.M., Sembok, T.M.T., Willett, P.: Comparison of N-gram Matching and Stemming for Term Conflation in English, Malay, and Turkish Texts. The Journal of Computer Text Processing 6(1), 1–14 (1996)

    Google Scholar 

  4. Lennon, M., Peirce, D.S., Tarry, B.D., Willett, P.: An Evaluation of Some Conflation Algorithms for Information Retrieval. Journal of Information Science 3, 177–183 (1981)

    Article  Google Scholar 

  5. Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992b)

    Google Scholar 

  6. Freund, G.E., Willett, P.: Online Identification of Word Variants and Arbitrary Truncation Searching Using a String Similarity Measure. Information Technology Research and Development 1, 177–187 (1982)

    Google Scholar 

  7. Popovic, M., Willett, P.: The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data. Journal of the American Society for Information Science 43(5), 384–390 (1992)

    Article  Google Scholar 

  8. Frakes, W.B.: Term Conflation for Information Retrieval. In: van Rijsbergen, C.J. (ed.) Research and Development in Information Retrieval, pp. 383–390. CUP, Cambridge (1984)

    Google Scholar 

  9. Hafer, M.A., Weiss, S.F.: Word Segmentation by Letter Successor Varieties. Information Storage and Retrieval 10, 371–385 (1974)

    Article  Google Scholar 

  10. Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Society for Information Science 42(1), 7–15 (1991)

    Article  Google Scholar 

  11. Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)

    Google Scholar 

  12. Niedermair, G.T., Thurmair, G., Buttel, I.: MARS A Retrieval Tool on the Basis of Morphological Analysis. In: van Rijsbergen, C.J. (ed.) Research and Development in Information Retrieval, pp. 369–380. CUP, Cambridge (1985)

    Google Scholar 

  13. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  14. Ulmschneider, J.E., Doszkocs, T.: A Practical Stemming Algorithm for Online Search Assistance. Online Review 7, 301–318 (1983)

    Article  Google Scholar 

  15. Walker, S., Jones, R.M.: Improving Subject Retrieval in Online Cataloques. 1. Stemming, Automatic Spelling Correction and Cross-Reference Tables. British Library Research Paper 24, London (1987)

    Google Scholar 

  16. Ahmad, F.: A Malay Language Document Retrieval System An Experimental Approach And Analysis. Ph.D. Thesis. Universiti Kebangsaan Malaysia (1995)

    Google Scholar 

  17. Savoy, J.: Stemming of French Words based on Grammatical Categories. Journal of the American Society for Information Science 44(1), 1–9 (1993)

    Article  Google Scholar 

  18. Sembok, T.M.T., Yussoff, M., Ahmad, F.: A Malay Stemming Algorithm for Information Retrieval. In: Proceedings of the 4th International Conference and Exhibition on Multi-lingual Computing, pp. 5.1.2.1–5.1.2.10 (1994)

    Google Scholar 

  19. Al-Kharashi, I.A., Evens, M.W.: Comparing Words, Stems and Roots as Index Terms in an Arabic Information Retrieval System. Journal of the American Society for Information Science 45(8), 548–560 (1994)

    Article  Google Scholar 

  20. Sembok, T.M.T., Willett, P.: Experiments with N-gram String-Similarity Measure on Malay Texts. Technical Report, Universiti Kebangsaan Malaysia (1995)

    Google Scholar 

  21. Abu Bakar, Z., Sembok, T.M.T., Yussoff, M.: Kajian Keberkesanan Algoritma Gabungan Dalam Capaian Maklumat atas Dokumen Melayu. In: Prosiding Simposium Kebangsaan Sains Matematik, vol. 7, pp. 260–266 (1996)

    Google Scholar 

  22. Abu Bakar, Z., Sembok, T.M.T., Yusoff, M.: Experiment on Conflation Algorithms on Malay Texts for Document Retrieval. In: Proceedings of the 15th IASTED International Conference, pp. 229–231 (1997)

    Google Scholar 

  23. Abu Bakar, Z.: Evaluation Of Retrieval Efectiveness Of Conflation Methods On Malay Documents. Ph.D. Thesis, Universiti Kebangsaan Malaysia (1999)

    Google Scholar 

  24. Srinivasan, P.: Thesaurus Construction. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 161–175. Prentice Hall, Eaglewood Cliffs (1992)

    Google Scholar 

  25. Rapizal, A.T.: To Improve Malay Document Retrieval System Using Thesaurus Approach Base On User Query. B.Sc. Thesis. Universiti Teknologi MARA (2000)

    Google Scholar 

  26. Frakes, W.B.: Introduction to Information Storage and Retrieval Systems. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval:Data Structures & Algorithms, pp. 1–12. Prentice Hall, Englewood Cliffs (1992a)

    Google Scholar 

  27. Robertson, S.E.: The Methodology of Information Retrieval Experiment. In: Sparck Jones, K. (ed.) Information Retrieval Experiment, pp. 9–13. Butterworths, London (1981)

    Google Scholar 

  28. Hamidy, H.Z., Fachruddin, H.S.: Tafsir Quran. Translation. Klang Book Centre, Klang (1987)

    Google Scholar 

  29. Popovic, M.: Implementation of a Slovene Language-Based Free-Text Retrieval System. PhD. Thesis. University of Sheffield (1991)

    Google Scholar 

  30. Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  31. Tague, J.M.: The pragmatics of information retrieval experimentation. In: Sparck Jones, K. (ed.) Information Retrieval Experiment, pp. 59–102. Butterworths, London (1981)

    Google Scholar 

  32. Mokhtar, M.R.: Incorporating Stemming Algorithms in the Malay Information Retrieval that Employs Thesaurus Approach. B.Sc. Thesis. Universiti Teknologi MARA (2001)

    Google Scholar 

  33. Abas, M.Z.M.: Image and Translated Al-Quran Verses Retrieval System Using Thesaurus Approach Base on Malay Query Words. B.Sc. Thesis. Universiti Teknologi MARA (2001)

    Google Scholar 

  34. Abdullah, Ainon: Tesaurus Bahasa Melayu. Utusan Publication Sdn Bhd, Kuala Lumpur (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bakar, Z.A., Rahman, N.A. (2003). Evaluating the Effectiveness of Thesaurus and Stemming Methods in Retrieving Malay Translated Al-Quran Documents. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, S.R., Myaeng, SH. (eds) Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. ICADL 2003. Lecture Notes in Computer Science, vol 2911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24594-0_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24594-0_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20608-8

  • Online ISBN: 978-3-540-24594-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics