Skip to main content

From regular expressions to DFA's using compressed NFA's

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 644))

Abstract

We show how to turn a regular expression R of length r into an O(s) space representation of McNaughton and Yamada's NFA, where s is the number of occurrences of alphabet symbols in R, and s+1 is the number of NFA states. The standard adjacency list representation of McNaughton and Yamada's NFA takes up s + s2 space in the worst case. The adjacency list representation of the NFA produced by Thompson takes up between 2r and 6r space, where r can be arbitrarily larger than s. Given any set V of NFA states, our representation can be used to compute the set U of states one transition away from the states in V in optimal time O(¦V¦+¦U¦). McNaughton and Yamada's NFA requires Θ(¦V¦ × ¦U¦) time in the worst case. Using Thompson's NFA, the equivalent calculation requires Θ(r) time in the worst case.

An implementation of our NFA representation confirms that it takes up an order of magnitude less space than McNaughton and Yamada's machine. An implementation to produce a DFA from our NFA representation by subset construction shows linear and quadratic speedups over subset construction starting from both Thompson's and McNaughton and Yamada's NFA's. It also shows that the DFA produced from our NFA is as much as one order of magnitude smaller than DFA's constructed from the two other NFA's.

Throughout this paper the importance of syntax is stressed in the design of our algorithms. In particular, we exploit a method of program improvement in which costly repeated calculations can be avoided by establishing and maintaining program invariants. This method of symbolic finite differencing has been used previously by Douglas Smith to derive efficient functional programs.

This research was partially supported by Office of Naval Research Grant No. N00014-90-J-1890 and Air Force Office of Scientific Research Grant No. AFOSR-91-0308

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A., Hopcroft, J. and Ullman J., “Design and Analysis of Computer Algorithms”, Reading, Addison-Wesley, 1974.

    Google Scholar 

  2. Aho, A., Sethi, R. and Ullman, J., “Compilers Principles, Techniques, and Tools”, Reading, Addison-Wesley, 1986.

    Google Scholar 

  3. Aho, A., “Pattern Matching in Strings”, in Formal Language Theory, ed. R. V. Book, Academic Press, Inc. 1980.

    Google Scholar 

  4. Berry, G. and Cosserat, L., “The Esterel synchronous programming language and its mathematical semantics” in Seminar in Concurrency, S. D. Brookes, A. W. Roscoe, and G. Winskel, eds., LNCS 197, Springer-Verlag, 1985.

    Google Scholar 

  5. Berry, G. and Sethi, R., “From Regular Expressions to Deterministic Automata” Theoretical Computer Science, 48 (1986), pp. 117–126.

    Google Scholar 

  6. Brüggemann-Klein, A., “Regular Expressions into Finite Automata”, To appear in Theoretical Computer Science, 1992.

    Google Scholar 

  7. Brzozowski, J., “Derivatives of Regular Expressions”, JACM, Vol. 11, No. 4., Oct. 1964, pp. 481–494.

    Google Scholar 

  8. Cai, J. and Paige, R., “Look Ma, No Hashing, And No Arrays Neither”, ACM POPL, Jan. 1991, pp. 143–154.

    Google Scholar 

  9. Chang, C., Ph. D. Thesis, To Appear, 1992.

    Google Scholar 

  10. Emerson, E. and Lei, C., “Model Checking in the Propositional Mu-Calculus”, Proc. IEEE Conf. on Logic in Computer Science, 1986, pp. 86–106.

    Google Scholar 

  11. Hopcroft, J. and Ullman, J., “Formal Languages and Their Relation to Automata”, Reading, Addison-Wesley, 1969.

    Google Scholar 

  12. Kleene, S., “Representation of events in nerve nets and finite automata”, in Automata Studies, Ann. Math. Studies No. 34, Princeton U. Press, 1956, pp. 3–41.

    Google Scholar 

  13. Knuth, D., “On the translation of languages from left to right”, Information and Control, Vol. 8, Num. 6, 1965, pp. 607–639.

    Google Scholar 

  14. McNaughton, R. and Yamada, H. “Regular Expressions and State Graphs for Automata”, IRA Trans. on Electronic Computers, Vol. EC-9, No. 1, Mar. 1960, pp 39–47.

    Google Scholar 

  15. Myhill, J., “Finite automata and representation of events,” WADC, Tech. Rep. 57-624, 1957.

    Google Scholar 

  16. Nerode, A., “Linear automaton transformations,” Proc. Amer. Math Soc., Vol. 9, pp. 541–544, 1958.

    Google Scholar 

  17. Rabin, M. and Scott, D., “Finite automata and their decision problems” IBM J. Res. Develop., Vol. 3, No. 2, Apr., 1959, pp. 114–125.

    Google Scholar 

  18. Ritchie, D. and Thompson, K. “The UNIX Time-Sharing System” Communication ACM, Vol. 17, No. 7, Jul., 1974, pp. 365–375.

    Google Scholar 

  19. Smith, D., “KIDS — A Knowledge-Based Software Development System”, in Proc. Workshop on Automating Software Design, AAAI-88, 1988.

    Google Scholar 

  20. “SunOS Reference Manual VOL. II”, Programmer's Manual, SUN microsystems, 1989.

    Google Scholar 

  21. Thompson, K., ”Regular Expression search Algorithm”, Communication ACM 11:6 (1968), pp. 419–422.

    Google Scholar 

  22. Ullman, J., “Computational Aspects of VLSI”, Computer Science Press, 1984.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chang, CH., Paige, R. (1992). From regular expressions to DFA's using compressed NFA's. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-56024-6_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56024-1

  • Online ISBN: 978-3-540-47357-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics