Abstract:
Arabic is a highly inflected language, and therefore the processes of stemming and root extracting represent a challenge to researches. A new method is presented for extr...Show MoreMetadata
Abstract:
Arabic is a highly inflected language, and therefore the processes of stemming and root extracting represent a challenge to researches. A new method is presented for extracting Arabic text stem, and lemma. Stemming sometimes affects the semantic of a word, where as lemma preserve the meaning of a word. The approach is based on pattern extraction. It uses a special encoding based on dividing letters into original and non-original letters. Codes are automatically generated for each pattern and then match against input text to extract root, pattern, and lemma of a word. A comparison with other methods reveals a promising result with accuracy up to 96%.
Published in: 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010)
Date of Conference: 10-13 May 2010
Date Added to IEEE Xplore: 18 October 2010
ISBN Information: