A fast and compact technique of implementing transition tables for finite state automata
Introduction
In many computer sciences, a finite state automaton (FA) is a well-known machine. Examples include automata for lexical and syntax analyzers (LR parse) of a compiler [1], [2], voice recognition [3], bibliographic search [4], [5], [6], spelling check [7], sequential circuits and so on.
Storing and retrieving the transition tables (i.e., for goto functions) of the FA efficiency is an important study because data retrieval is the most time-consuming part of many programs, and the use of a good method rather than a bad one often leads to a substantial increase in processing speed. In the implementation of the state transitions of the FA, it is important how to store and retrieve transitions, or arcs, efficiently, defined by the goto function.
Typical data structures for storing the goto functions are the matrix form and linear list form. The former stores the defined goto functions together with the undefined goto functions but the latter stores only defined goto functions. These data structures have desirable and undesirable features [1], [7], [8] with respect to space requirements and access time, respectively.
Aho and Ullman [1] introduced three one-dimensional arrays, called a triple array, combining the fast access of an array, which is the worst-case time complexity O(1), with the compactness of the list. The triple-array method is suitable when we wish to emphasize the time and space efficiencies of the reduced machine rather than the time spent in reduction of goto functions of the machine. We call the machine a static finite state machine. For example a lexical analyzer [1], [2] is a typical case, since it has never been modified by the user and is the only process that must examine the input one character at a time. There are many cases of the static machines in such applications as voice recognition, spelling check and parsing.
This paper proposes an improved method for storage and retrieval of transition tables. We first propose an algorithm for reducing machines with tree structure and extend the approach to general transition tables. From the simulation results, it is verified that the presented method can produce compact structures.
Section snippets
A finite state machine
We define a deterministic finite state machine (FA) M as follows:where K and I are the finite set of states and a finite set of input symbols, respectively, g the gate function mapping from K×I to K, sI the initial state in K and F⊆K is the finite set of final states.
Since the transition table of the finite state machine which machines a finite number of words consists of a tree structure, we define the following finite state machine. Definition 1 Let indegree(s) be the number of transitions
Vector representation of holes
In the two-array structure, it is difficult to minimize the number of holes in the tables for any goto functions of the T-type finite state machine, but Algorithm A simplifies the problem by dividing the reducing process into certain subprocesses and minimizes the number of holes in each subprocess (Algorithm A is shown later). We introduce the data structure by bit vectors b1b2⋯bi⋯bn in which bi=0 if the entry in the table is a hole; bi=1 otherwise. Definition 2 Let K′ be a set of states s such that
Revised Algorithm A
A general finite state machine differs from the T-type finite state machine in that the former has a state s such that . If the data structure of Section 3 is used for s, then the following undesirable feature arises.
Suppose that g(s′,a′)=s and g(s″,a″)=s for s′, s′ and s″ in K and for a′ and a″ in I. Algorithm A defines two values j′ and j″ for new(s) such thatThus, we must merge the two values into one. Our approach is very convenient for this
Conclusion
This paper has described a method of improving the triple-array method for storage and retrieval of transition tables. Our technique can be used to reduce sparse matrices [9], [10] and DAWG [11], and the procedure MERGE() can also be used to construct hash functions for the table base, since the procedure is for the table base. Since the procedure is very efficient by virtue of the bit vectors, it is applicable to compiler bookkeeping without the static condition.
References (11)
- et al.
- et al.
- K. Kita, A study ion language modeling for speech recognition, Ph.D thesis, Waseda University,...
- et al.
Efficient string matching: an aid to bibliographic search
Commun. ACM
(1975) - J. Aoe, Computer Algorithms – String Pattern Matching Strategies, IEEE Computer Society Press, Silver Spring, MD,...
Cited by (2)
An efficient e-mail filtering using time priority measurement
2004, Information SciencesA simplification algorithm of regular grammar production
2009, 2009 1st International Conference on Information Science and Engineering, ICISE 2009