Abstract
We show how to turn a regular expression R of length r into an O(s) space representation of McNaughton and Yamada's NFA, where s is the number of occurrences of alphabet symbols in R, and s+1 is the number of NFA states. The standard adjacency list representation of McNaughton and Yamada's NFA takes up s + s2 space in the worst case. The adjacency list representation of the NFA produced by Thompson takes up between 2r and 6r space, where r can be arbitrarily larger than s. Given any set V of NFA states, our representation can be used to compute the set U of states one transition away from the states in V in optimal time O(¦V¦+¦U¦). McNaughton and Yamada's NFA requires Θ(¦V¦ × ¦U¦) time in the worst case. Using Thompson's NFA, the equivalent calculation requires Θ(r) time in the worst case.
An implementation of our NFA representation confirms that it takes up an order of magnitude less space than McNaughton and Yamada's machine. An implementation to produce a DFA from our NFA representation by subset construction shows linear and quadratic speedups over subset construction starting from both Thompson's and McNaughton and Yamada's NFA's. It also shows that the DFA produced from our NFA is as much as one order of magnitude smaller than DFA's constructed from the two other NFA's.
Throughout this paper the importance of syntax is stressed in the design of our algorithms. In particular, we exploit a method of program improvement in which costly repeated calculations can be avoided by establishing and maintaining program invariants. This method of symbolic finite differencing has been used previously by Douglas Smith to derive efficient functional programs.
This research was partially supported by Office of Naval Research Grant No. N00014-90-J-1890 and Air Force Office of Scientific Research Grant No. AFOSR-91-0308
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Aho, A., Hopcroft, J. and Ullman J., “Design and Analysis of Computer Algorithms”, Reading, Addison-Wesley, 1974.
Aho, A., Sethi, R. and Ullman, J., “Compilers Principles, Techniques, and Tools”, Reading, Addison-Wesley, 1986.
Aho, A., “Pattern Matching in Strings”, in Formal Language Theory, ed. R. V. Book, Academic Press, Inc. 1980.
Berry, G. and Cosserat, L., “The Esterel synchronous programming language and its mathematical semantics” in Seminar in Concurrency, S. D. Brookes, A. W. Roscoe, and G. Winskel, eds., LNCS 197, Springer-Verlag, 1985.
Berry, G. and Sethi, R., “From Regular Expressions to Deterministic Automata” Theoretical Computer Science, 48 (1986), pp. 117–126.
Brüggemann-Klein, A., “Regular Expressions into Finite Automata”, To appear in Theoretical Computer Science, 1992.
Brzozowski, J., “Derivatives of Regular Expressions”, JACM, Vol. 11, No. 4., Oct. 1964, pp. 481–494.
Cai, J. and Paige, R., “Look Ma, No Hashing, And No Arrays Neither”, ACM POPL, Jan. 1991, pp. 143–154.
Chang, C., Ph. D. Thesis, To Appear, 1992.
Emerson, E. and Lei, C., “Model Checking in the Propositional Mu-Calculus”, Proc. IEEE Conf. on Logic in Computer Science, 1986, pp. 86–106.
Hopcroft, J. and Ullman, J., “Formal Languages and Their Relation to Automata”, Reading, Addison-Wesley, 1969.
Kleene, S., “Representation of events in nerve nets and finite automata”, in Automata Studies, Ann. Math. Studies No. 34, Princeton U. Press, 1956, pp. 3–41.
Knuth, D., “On the translation of languages from left to right”, Information and Control, Vol. 8, Num. 6, 1965, pp. 607–639.
McNaughton, R. and Yamada, H. “Regular Expressions and State Graphs for Automata”, IRA Trans. on Electronic Computers, Vol. EC-9, No. 1, Mar. 1960, pp 39–47.
Myhill, J., “Finite automata and representation of events,” WADC, Tech. Rep. 57-624, 1957.
Nerode, A., “Linear automaton transformations,” Proc. Amer. Math Soc., Vol. 9, pp. 541–544, 1958.
Rabin, M. and Scott, D., “Finite automata and their decision problems” IBM J. Res. Develop., Vol. 3, No. 2, Apr., 1959, pp. 114–125.
Ritchie, D. and Thompson, K. “The UNIX Time-Sharing System” Communication ACM, Vol. 17, No. 7, Jul., 1974, pp. 365–375.
Smith, D., “KIDS — A Knowledge-Based Software Development System”, in Proc. Workshop on Automating Software Design, AAAI-88, 1988.
“SunOS Reference Manual VOL. II”, Programmer's Manual, SUN microsystems, 1989.
Thompson, K., ”Regular Expression search Algorithm”, Communication ACM 11:6 (1968), pp. 419–422.
Ullman, J., “Computational Aspects of VLSI”, Computer Science Press, 1984.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, CH., Paige, R. (1992). From regular expressions to DFA's using compressed NFA's. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_8
Download citation
DOI: https://doi.org/10.1007/3-540-56024-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56024-1
Online ISBN: 978-3-540-47357-2
eBook Packages: Springer Book Archive