Abstract
We present a finite state morphology system augmented with typed feature structures as weights on transitions. This mechanism allows the use of highly efficient finite state approaches for morphological analysis and generation, while providing the rich linguistic descriptions often used in Machine Translation systems. Using a semiring interpretation, the weight of a morphological analysis result represents the possible linguistic interpretations of an input word, while the resulting character string itself represents the lemma of the input. Long-distance phenomena and infixation can be handled in an easy and elegant manner, simultaneously providing a seamless interface to subsequent linguistic processing modules. Two extensions to the basic model are discussed: the incorporation of lexical knowledge into the finite state transducer and a transformation that renders unification-based finite state models as efficient as those employing other weight structures. The model is applied to morphological operations in a Persian–English Machine Translation system.
Similar content being viewed by others
References
Kenneth R. Beesley Karttunen Lauri (2003) Finite State Morphology CSLI Publications Stanford, CA
Joan. Bresnan (Eds) (1982) The Mental Representation of Grammatical Relations MIT Press Cambridge, MA
Bob. Carpenter (1992) The Logic of Typed Feature Structures Cambridge University Press Cambridge, MA
Cohen-Sygal, Yael, Dale Gerdemann, and Shuly Wintner: (2003), ‘Computational Implementation of Non-Concatenative Morphology’ in Proceedings of the Workshop on Finite State Methods in Natural Language Processing, Budapest, Hungary, pp. 59–66.
Fissaha, Sisay and Johann Haller: (2003), ‘Amharic Verb Lexicon in the Context of Machine Translation’ in TALN 2003: 10e conférence sur le traitement automatique des langues, Batz-sur-Mer, France, Vol. 2, pp. 183–192.
Ilie, Lucian and Sheng Yu: (2002), ‘Constructing NFAs by Optimal Use of Positions in Regular Expressions’, in Alberto Apostolico and Masayuki Takeda (eds) Combinatorial Pattern Matching, 13th Annual Symposium, CPM 2002, Fukuoka, Japan, Berlin: Springer, pp. 279–288.
Ronald M. Kaplan Kay Martin (1994) ArticleTitle‘Regular Models of Phonological Rule Systems’ Computational Linguistics. 20 331–378
Kiraz, George Anton: (1994), ‘Multi-Tape Two-Level Morphology: a Case Study in Semitic Non-linear Morphology’ in Coling 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 180–186.
Kiraz, George Anton: (1997), ‘Compiling Rule Formalisms with Rule Features into Finite- State Automata’ in 35th Annual Meeting of the Association of Computational Linguistics and 8th Conference of the European Chapter of the Association of Computational Linguistics, Madrid, Spain, pp. 329–336.
K. Knight (1989) ArticleTitle’Unification: A Multi-Disciplinary Survey’ ACM Computing Surveys. 21 98–124 Occurrence Handle10.1145/62029.62030
Koskenniemi, Kimmo: (1983), ‘Two-level Model for Morphological Analysis’ in Proceedings of the 8th International Conference on Artificial Intelligence, IJCAI’83, Karlsruhe, West Germany, pp. 683–685.
Krieger, Hans-Ulrich, Hannes Pirker and John Nerbonne: (1993), ‘Feature-Based Allomorphy’ in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 140–147.
Megerdoomian, Karine: (2000), ‘Unification-Based Persian Morphology’ in CICLing-2000: Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico, pp. 135–149.
Mehryar. Mohri (1997) ArticleTitle‘Finite-State Transducers in Language and Speech Processing’ Computational Linguistics 23 269–311
Mehryar. Mohri (2000) ArticleTitle‘Minimization Algorithms for Sequential Transducers’ Theoretical Computer Science 234 177–201 Occurrence Handle10.1016/S0304-3975(98)00115-7
Mehryar Mohri (2002) ArticleTitle‘Generic ε-Removal and Input ε-Normalization for Weighted Transducers’ International Journal of Foundations of Computer Science 13 129–143 Occurrence Handle10.1142/S0129054102000996
Mehryar. Mohri Pereira. Fernando Riley. Michael (2000) ArticleTitle’The Design Principles of a Weighted Finite-State Transducer Library’ Theoretical Computer Science 231 17–32 Occurrence Handle10.1016/S0304-3975(99)00014-6 Occurrence HandleMR1732685
Mohri, Mehryar and Michael Riley: (2001), ‘A Weight Pushing Algorithm for Large Vocabulary Speech Recognition’. in Proceedings of EUROSPEECH-2001: 7th European Conference on Speech Communication and Technology, Aalborg, Denmark, pp. 1603–1606.
Piskorski Jakub. (2002). DFKI Finite-State Machine Toolkit, Technical Report, Deutsches Forschungszentrum für Künstliche Intelligenz, Saarbrücken, Germany.
Carl. Pollard A. Sag. Ivan (1994) Head-Driven Phrase Structure Grammar University of Chicago Press Chicago, IL
D. Ritchie Graeme Pulman. Stephen G. Black. Alan W. Russell. Graham J. (1987) ArticleTitle‘A Computational Framework for Lexical Description’ Computational Linguistics 13 290–307
Harald. Trost (1991) ArticleTitle’A Morphological Component for the Recognition and Generation of Word Forms in Natural Language Understanding Systems: Integrating Two-Level Morphology and Feature Unification’ Applied Artificial Intelligence 4 411–457
Trost, Harald and Johannes Matiasek: (1994), ‘Morphology with a Null-Interface’ in Coling 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 141–147.
Zajac, Rémi: (1998), ‘Feature Structures, Unification and Finite-State Transducers,’ in International Workshop on Finite State Methods in Natural Language Processing, Ankara, Turkey, pp. 101–109.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amtrup, J.W. Morphology in Machine Translation Systems: Efficient Integration of Finite State Transducers and Feature Structure Descriptions. Mach Translat 18, 217–238 (2003). https://doi.org/10.1007/s10590-004-2476-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-004-2476-5