Abstract
Sequential transducers, introduced by Schützenberger [5], have advantageous computational properties. A sequential transducer is deterministic with respect to its input. Not all transducers can be sequentialized: but if one can be, it means time, and, often, space optimality. This article extends the subsequentialization algorithm of Mohri [3,4] for previously untreated classes of transducers. We
-
—
change the representation of final p-strings,
-
—
extend the sequentialization to input ε labels and their closures,
-
—
handle the unknown symbol..
Mohri uses final p-strings to express p-subsequentiality. We convert them to real arcs and states to have a more uniform representation and to maintain the two-sided applicability of the transducer. This change is of linear complexity.
An ε-closure set and appropriate modifications in the subsequentialization algorithm of Mohri make it possible to handle transducers containing input-side ε labels. This does not require any intermediate transformation of the transducer.
Our ε-closure modification solves a complexity problem in subsequentiable transducers that contain arcs with an input ε label either on the input or on the output side. An illustration of such cases is the transduction of a string of length n to another string of the same size (Fig. 1). Such a transducer can be ambiguous according to a rapidly growing function in n; a (modest) lower bound for the number of possible paths is O((2n/n)), for which the lower bound is o(3n),that is such a tranducer has an exponential number of ambiguous paths expressing the same mapping. If such a transducer is Kleene star-red (allowing repetitions of the input string), k repetitions will require more than O((2n/n)k) recognition complexity.We only know a recursive form for the number of possible paths in the general case but lower bound approximations show that such cases become rapidly untreatable. But such transducers can be transformed, by ε-sequentialization, making recognition complexity linear, O(nk).
By using the ε-closure, the ambiguities stemming from input ε transitions can be handled, and both ordinary and ε-ambiguities are (sub)sequentialized in the same step, by local modifications. This extension does not increase the complexity of the original algorithm. The ε-closure operation is a semiring operation creating a set of directed acyclic graphs. Such ε-ambiguities may and do arise in finite-state compilers and tools.
The unknown symbol is an extension of the usual transducer notation to define special treatment for input symbols not in the input alphabet of the transducer [1]. It can be present, if handled specially, in sequentialization and in subsequent finite-state calculus operations and applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
T. Gaál and L. Karttunen. Improving Mohri’s algorithm in the Xerox finite state calculus. In Proceedings of the 9th International Conference on Automata and Formal Languages (AFL’99), 1999. To appear in Publicationes Mathematicae.
L. Karttunen and K.R. Beesley. Finite-State Morphology: Xerox Tools and Techniques. Cambridge University Press, Cambridge UK, 2000? Forthcoming.
M. Mohri. Compact representation by finite-state transducers. In Proceedings of the 32nd meeting of the Association for Computational Linguistics (ACL 94), 1994.
M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, pages 269–312, 1997.
M.P. Schützenberger. Sur une variante des fonctions sequentielles. Theoretical Computer Science, 4(1):47–57, 1977.
G. van Noord and D. Gerdemann. An extendible regular expression compiler for finite-state approaches in natural language processing. In Proceedings of the Workshop on Implementing Automata (WIA’99), LNCS, Potsdam, 1999. Springer. To appear.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gaál, T. (2001). Extended Sequentialization of Transducers. In: Yu, S., Păun, A. (eds) Implementation and Application of Automata. CIAA 2000. Lecture Notes in Computer Science, vol 2088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44674-5_32
Download citation
DOI: https://doi.org/10.1007/3-540-44674-5_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42491-8
Online ISBN: 978-3-540-44674-3
eBook Packages: Springer Book Archive