Abstract
SANTI-morf is a new morphological annotation system for Indonesian, implemented using Nooj [1, 2]. SANTI-morf is designed using multi-module pipeline architecture. The modules are the Annotator, the Improver, the Disambiguator, and the Guesser. The Guesser, as its name suggests, provides best guesses for words the Annotator fails to analyze. Due to the complexities of Indonesian morphology, multiple layers of rules are created to guess the morphological structures of unknown polymorphemic and monomorphemic words. These rules are incorporated into five morphological grammars, which are applied in a pipeline based on their priorities. In each grammar, there are two layers of rules. The first layer rules are prioritized, thus ending with a +UNAMB operator. The second layer rules only apply when the first layer rules fail to find any match. Thus, the rules are constructed without a +UNAMB operator. Reflecting on the complexity of this experiment, I therefore suggest an alternative to set priorities, whose method I simulate in this paper. I argue that using the proposed alternative, NooJ users can organize rules with multiple priorities in just one grammar file.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Silberztein, M.: NooJ Manual (2003). www.nooj4nlp.net
Silberztein, M.: Formalizing Natural Languages Nooj Approach. Wiley, London (2016)
Šmerk, P., Sojka, P., Horák, A.: Towards Czech morphological guesser. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, Brno, pp. 1–4 (2009)
Larasati, S.D., Kuboň, V., Zeman, D.: Indonesian morphology tool (MorphInd): towards an indonesian corpus. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 119–129. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23138-4_8
Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, Las Vegasmorph, pp. 273–280 (2003)
Harman, D.: How effective is suffixing. J. Am. Soc. Inf. Sci. 42(1), 7–15 (1991)
Hull, D.-A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)
Prihantoro: SANTI-morf: a new morphological annotation system for Indonesian (a PhD thesis: forthcoming). Lancaster University Press, Lancaster (2021)
Prihantoro: The morphological annotation of reduplication-circumfix intersection in Indonesian. In: Formalising Natural Languages: Applications to Natural Language Processing and Digital Humanities (2021)
NooJ 2020. Communications in Computer and Information Science. CCIS, Zagreb, pp. 37–48 (2021)
Prihantoro: Tweaking NooJ’s resources to export morpheme-level or intra-word annotations. In: Bigey, M., Richeton, A., Silberztein, M., Thomas, I. (eds.) NooJ 2021, pp. 3–14. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92861-2_1
Mueller, F.: Indonesian morphology. In: Morphologies of Asia and Africa, pp. 1207–1230. Eisenbraums, Winnona (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Prihantoro (2022). The Architecture of SANTI-Morf’s Guesser Module. In: González, M., Reyes, S.S., Rodrigo, A., Silberztein, M. (eds) Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities. NooJ 2022. Communications in Computer and Information Science, vol 1758. Springer, Cham. https://doi.org/10.1007/978-3-031-23317-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-23317-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23316-6
Online ISBN: 978-3-031-23317-3
eBook Packages: Computer ScienceComputer Science (R0)