Skip to main content

Abstract

SANTI-morf is a new morphological annotation system for Indonesian, implemented using Nooj [1, 2]. SANTI-morf is designed using multi-module pipeline architecture. The modules are the Annotator, the Improver, the Disambiguator, and the Guesser. The Guesser, as its name suggests, provides best guesses for words the Annotator fails to analyze. Due to the complexities of Indonesian morphology, multiple layers of rules are created to guess the morphological structures of unknown polymorphemic and monomorphemic words. These rules are incorporated into five morphological grammars, which are applied in a pipeline based on their priorities. In each grammar, there are two layers of rules. The first layer rules are prioritized, thus ending with a +UNAMB operator. The second layer rules only apply when the first layer rules fail to find any match. Thus, the rules are constructed without a +UNAMB operator. Reflecting on the complexity of this experiment, I therefore suggest an alternative to set priorities, whose method I simulate in this paper. I argue that using the proposed alternative, NooJ users can organize rules with multiple priorities in just one grammar file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Silberztein, M.: NooJ Manual (2003). www.nooj4nlp.net

  2. Silberztein, M.: Formalizing Natural Languages Nooj Approach. Wiley, London (2016)

    Book  Google Scholar 

  3. Šmerk, P., Sojka, P., Horák, A.: Towards Czech morphological guesser. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, Brno, pp. 1–4 (2009)

    Google Scholar 

  4. Larasati, S.D., Kuboň, V., Zeman, D.: Indonesian morphology tool (MorphInd): towards an indonesian corpus. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 119–129. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23138-4_8

    Chapter  Google Scholar 

  5. Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, Las Vegasmorph, pp. 273–280 (2003)

    Google Scholar 

  6. Harman, D.: How effective is suffixing. J. Am. Soc. Inf. Sci. 42(1), 7–15 (1991)

    Article  MathSciNet  Google Scholar 

  7. Hull, D.-A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)

    Article  Google Scholar 

  8. Prihantoro: SANTI-morf: a new morphological annotation system for Indonesian (a PhD thesis: forthcoming). Lancaster University Press, Lancaster (2021)

    Google Scholar 

  9. Prihantoro: The morphological annotation of reduplication-circumfix intersection in Indonesian. In: Formalising Natural Languages: Applications to Natural Language Processing and Digital Humanities (2021)

    Google Scholar 

  10. NooJ 2020. Communications in Computer and Information Science. CCIS, Zagreb, pp. 37–48 (2021)

    Google Scholar 

  11. Prihantoro: Tweaking NooJ’s resources to export morpheme-level or intra-word annotations. In: Bigey, M., Richeton, A., Silberztein, M., Thomas, I. (eds.) NooJ 2021, pp. 3–14. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92861-2_1

    Chapter  Google Scholar 

  12. Mueller, F.: Indonesian morphology. In: Morphologies of Asia and Africa, pp. 1207–1230. Eisenbraums, Winnona (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prihantoro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Prihantoro (2022). The Architecture of SANTI-Morf’s Guesser Module. In: González, M., Reyes, S.S., Rodrigo, A., Silberztein, M. (eds) Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities. NooJ 2022. Communications in Computer and Information Science, vol 1758. Springer, Cham. https://doi.org/10.1007/978-3-031-23317-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23317-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23316-6

  • Online ISBN: 978-3-031-23317-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics