As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
In this paper, we present a stemming methodology based both on a hand-crafted rule-based system and data-driven machine learning approaches. The rule-based system models phenomena of Latvian, a highly inflectional language, in a linguistically sound and consistent way. While the handcrafted stemmer can be used on its own, it may also serve as a supplier of training data for our statistical modeling. This relies on two assumptions which are quite natural in the context of stemming and many other NLP applications such as grapheme-to-phoneme conversion, lemmatization, etc., namely that the output sequence is not longer than the input sequence and that the orderings of input and output sequence characters are ‘similar’. Under these conditions, we train several machine learning algorithms and show that very good results for stemming in Latvian can be obtained by combining them via bootstrapping and ensemble of classifiers methods.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.