Extended Sequentialization of Transducers

Gaál, Tamás

doi:10.1007/3-540-44674-5_32

Tamás Gaál⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2088))

Included in the following conference series:

International Conference on Implementation and Application of Automata

401 Accesses
1 Citations

Abstract

Sequential transducers, introduced by Schützenberger [5], have advantageous computational properties. A sequential transducer is deterministic with respect to its input. Not all transducers can be sequentialized: but if one can be, it means time, and, often, space optimality. This article extends the subsequentialization algorithm of Mohri [3,4] for previously untreated classes of transducers. We

—
change the representation of final p-strings,
—
extend the sequentialization to input ε labels and their closures,
—
handle the unknown symbol..

Mohri uses final p-strings to express p-subsequentiality. We convert them to real arcs and states to have a more uniform representation and to maintain the two-sided applicability of the transducer. This change is of linear complexity.

An ε-closure set and appropriate modifications in the subsequentialization algorithm of Mohri make it possible to handle transducers containing input-side ε labels. This does not require any intermediate transformation of the transducer.

Our ε-closure modification solves a complexity problem in subsequentiable transducers that contain arcs with an input ε label either on the input or on the output side. An illustration of such cases is the transduction of a string of length n to another string of the same size (Fig. 1). Such a transducer can be ambiguous according to a rapidly growing function in n; a (modest) lower bound for the number of possible paths is O((2n/n)), for which the lower bound is o(3ⁿ),that is such a tranducer has an exponential number of ambiguous paths expressing the same mapping. If such a transducer is Kleene star-red (allowing repetitions of the input string), k repetitions will require more than O((2n/n)^k) recognition complexity.We only know a recursive form for the number of possible paths in the general case but lower bound approximations show that such cases become rapidly untreatable. But such transducers can be transformed, by ε-sequentialization, making recognition complexity linear, O(nk).

By using the ε-closure, the ambiguities stemming from input ε transitions can be handled, and both ordinary and ε-ambiguities are (sub)sequentialized in the same step, by local modifications. This extension does not increase the complexity of the original algorithm. The ε-closure operation is a semiring operation creating a set of directed acyclic graphs. Such ε-ambiguities may and do arise in finite-state compilers and tools.

The unknown symbol is an extension of the usual transducer notation to define special treatment for input symbols not in the input alphabet of the transducer [1]. It can be present, if handled specially, in sequentialization and in subsequent finite-state calculus operations and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

T. Gaál and L. Karttunen. Improving Mohri’s algorithm in the Xerox finite state calculus. In Proceedings of the 9th International Conference on Automata and Formal Languages (AFL’99), 1999. To appear in Publicationes Mathematicae.
Google Scholar
L. Karttunen and K.R. Beesley. Finite-State Morphology: Xerox Tools and Techniques. Cambridge University Press, Cambridge UK, 2000? Forthcoming.
Google Scholar
M. Mohri. Compact representation by finite-state transducers. In Proceedings of the 32nd meeting of the Association for Computational Linguistics (ACL 94), 1994.
Google Scholar
M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, pages 269–312, 1997.
Google Scholar
M.P. Schützenberger. Sur une variante des fonctions sequentielles. Theoretical Computer Science, 4(1):47–57, 1977.
Article MATH MathSciNet Google Scholar
G. van Noord and D. Gerdemann. An extendible regular expression compiler for finite-state approaches in natural language processing. In Proceedings of the Workshop on Implementing Automata (WIA’99), LNCS, Potsdam, 1999. Springer. To appear.
Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Research Centre Europe - Grenoble Laboratory, 6, chemin de Maupertuis, 38240, Meylan, France
Tamás Gaál

Authors

Tamás Gaál
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Middlesex College, The University of Western Ontario, London, ON, Canada, N6A 5B7
Shen Yu & Andrei Păun &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gaál, T. (2001). Extended Sequentialization of Transducers. In: Yu, S., Păun, A. (eds) Implementation and Application of Automata. CIAA 2000. Lecture Notes in Computer Science, vol 2088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44674-5_32

Download citation

DOI: https://doi.org/10.1007/3-540-44674-5_32
Published: 20 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42491-8
Online ISBN: 978-3-540-44674-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics