Skip to main content

Extended Sequentialization of Transducers

  • Conference paper
  • First Online:
Implementation and Application of Automata (CIAA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2088))

Included in the following conference series:

Abstract

Sequential transducers, introduced by Schützenberger [5], have advantageous computational properties. A sequential transducer is deterministic with respect to its input. Not all transducers can be sequentialized: but if one can be, it means time, and, often, space optimality. This article extends the subsequentialization algorithm of Mohri [3,4] for previously untreated classes of transducers. We

  1. change the representation of final p-strings,

  2. extend the sequentialization to input ε labels and their closures,

  3. handle the unknown symbol..

Mohri uses final p-strings to express p-subsequentiality. We convert them to real arcs and states to have a more uniform representation and to maintain the two-sided applicability of the transducer. This change is of linear complexity.

An ε-closure set and appropriate modifications in the subsequentialization algorithm of Mohri make it possible to handle transducers containing input-side ε labels. This does not require any intermediate transformation of the transducer.

Our ε-closure modification solves a complexity problem in subsequentiable transducers that contain arcs with an input ε label either on the input or on the output side. An illustration of such cases is the transduction of a string of length n to another string of the same size (Fig. 1). Such a transducer can be ambiguous according to a rapidly growing function in n; a (modest) lower bound for the number of possible paths is O((2n/n)), for which the lower bound is o(3n),that is such a tranducer has an exponential number of ambiguous paths expressing the same mapping. If such a transducer is Kleene star-red (allowing repetitions of the input string), k repetitions will require more than O((2n/n)k) recognition complexity.We only know a recursive form for the number of possible paths in the general case but lower bound approximations show that such cases become rapidly untreatable. But such transducers can be transformed, by ε-sequentialization, making recognition complexity linear, O(nk).

By using the ε-closure, the ambiguities stemming from input ε transitions can be handled, and both ordinary and ε-ambiguities are (sub)sequentialized in the same step, by local modifications. This extension does not increase the complexity of the original algorithm. The ε-closure operation is a semiring operation creating a set of directed acyclic graphs. Such ε-ambiguities may and do arise in finite-state compilers and tools.

The unknown symbol is an extension of the usual transducer notation to define special treatment for input symbols not in the input alphabet of the transducer [1]. It can be present, if handled specially, in sequentialization and in subsequent finite-state calculus operations and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T. Gaál and L. Karttunen. Improving Mohri’s algorithm in the Xerox finite state calculus. In Proceedings of the 9th International Conference on Automata and Formal Languages (AFL’99), 1999. To appear in Publicationes Mathematicae.

    Google Scholar 

  2. L. Karttunen and K.R. Beesley. Finite-State Morphology: Xerox Tools and Techniques. Cambridge University Press, Cambridge UK, 2000? Forthcoming.

    Google Scholar 

  3. M. Mohri. Compact representation by finite-state transducers. In Proceedings of the 32nd meeting of the Association for Computational Linguistics (ACL 94), 1994.

    Google Scholar 

  4. M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, pages 269–312, 1997.

    Google Scholar 

  5. M.P. Schützenberger. Sur une variante des fonctions sequentielles. Theoretical Computer Science, 4(1):47–57, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  6. G. van Noord and D. Gerdemann. An extendible regular expression compiler for finite-state approaches in natural language processing. In Proceedings of the Workshop on Implementing Automata (WIA’99), LNCS, Potsdam, 1999. Springer. To appear.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gaál, T. (2001). Extended Sequentialization of Transducers. In: Yu, S., Păun, A. (eds) Implementation and Application of Automata. CIAA 2000. Lecture Notes in Computer Science, vol 2088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44674-5_32

Download citation

  • DOI: https://doi.org/10.1007/3-540-44674-5_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42491-8

  • Online ISBN: 978-3-540-44674-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics