Loading [MathJax]/extensions/MathMenu.js
Experiments in malay information retrieval | IEEE Conference Publication | IEEE Xplore

Experiments in malay information retrieval


Abstract:

There have been very few studies on the use of conflation algorithms for indexing and retrieval of Malay documents. The two main classes of conflation algorithms are stri...Show More

Abstract:

There have been very few studies on the use of conflation algorithms for indexing and retrieval of Malay documents. The two main classes of conflation algorithms are string-similarity algorithms and stemming algorithms. Stemming is used in information retrieval systems to reduce variant word forms to common roots in order to improve retrieval effectiveness. As in other languages, there is a need for an effective stemming algorithm for the indexing and retrieval of Malay documents. Again there are few research on n-gram string-similarity measures done on Malay. We have experimented on the application of stemming and string similarity matching on retrieving of verses from the Al-Quran in order to evaluated their effectiveness. Before retrieval effectiveness can be carried out an experimental data set need to be developed which comprises of a collection of documents, a set of queries, and their relevant judgements. In this paper we will describe the development of the experimental data set and the application of stemming and similarity matching algorithms in retrieving the verses from the Al-Quran. Inherent characteristics of n-grams and several variations of experiments performed on the queries and documents are discussed. The variations are: both non-stemmed queries and documents; stemmed queries and nonstemmed documents; and both stemmed queries and documents. Further experiment are then carried out by removing the most frequently occurring n-gram. The dice-coefficient is used as threshold and weight in ranking the retrieved documents. Beside using dice coefficients to rank documents, inverse document frequency weights are also used. Interpolation technique and standard recall-precision functions are used to evaluate the retrieval effectiveness.
Date of Conference: 17-19 July 2011
Date Added to IEEE Xplore: 19 September 2011
ISBN Information:

ISSN Information:

Conference Location: Bandung, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.