Skip to main content

Regular expressions into finite automata

  • Conference paper
  • First Online:
LATIN '92 (LATIN 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 583))

Included in the following conference series:

Abstract

It is a well-established fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without -transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of an -free NFA due to Glushkov [Glu61] is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives [Brz64] of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard [ISO86], now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types.

In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the expression, and that this is worst case optimal. For deterministic expressions, our algorithm has even linear run time. This improves on the cubic time methods suggested in the literature [BEGO71,ASU86,BS86]. A major step of the algorithm consists in bringing the expression into what we call star normal form. This concept is also useful for characterizing the relationship between two types of unambiguity that have been studied in the literature. Namely, we show that, modulo a technical condition, an expression is strongly unambiguous [SS88] if and only it is weakly unambiguous [BEGO71] and in star normal form. This leads to our third result, a quadratic time decision algorithm for weak unambiguity, that improves on the biquadratic method introduced by Book et al. [BEGO71].

This article was processed using the LATEX macro package with LMAMULT style

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Series in Computer Science, Addison-Wesley, Reading, Massachusetts, 1986.

    Google Scholar 

  2. Ronald Book, Shimon Even, Sheila Greibach, and Gene Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, C-20(2):149–153, February 1971.

    Google Scholar 

  3. Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM, 11(4):481–494, October 1964.

    Google Scholar 

  4. Gerard Berry and Ravi Sethi. From regular expressions to deterministic automata. Theoretical Computer Science, 48:117–126, 1986.

    Google Scholar 

  5. Anne Brüggemann-Klein and Derick Wood. Parser generators for document grammars. Submitted for publication, 1991.

    Google Scholar 

  6. V.M. Glushkov. The abstract theory of automata. Russian Mathematical Surveys, 16:1–53, 1961.

    Google Scholar 

  7. Frederick C. Hennie. Finite-State Models for Logical Machines. John Wiley, New York, 1968.

    Google Scholar 

  8. John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley Series in Computer Science, Addison-Wesley, Reading, Massachusetts, 1979.

    Google Scholar 

  9. ISO 8879. Information processing — text and office systems — standard generalized markup language (SGML). October 1986. International Organization for Standardization.

    Google Scholar 

  10. Seppo Sippu and Eljas Soisalon-Soininen. Parsing Theory. Volume 1, Languages and Parsing, of EATCS Monographs on Theoretical Computer Science, Springer-Verlag, Berlin, 1988.

    Google Scholar 

  11. Derick Wood. Theory of Computation. John Wiley, New York, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Imre Simon

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brüggemann-Klein, A. (1992). Regular expressions into finite automata. In: Simon, I. (eds) LATIN '92. LATIN 1992. Lecture Notes in Computer Science, vol 583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0023820

Download citation

  • DOI: https://doi.org/10.1007/BFb0023820

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-55284-0

  • Online ISBN: 978-3-540-47012-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics