A fast and compact technique of implementing transition tables for finite state automata

doi:10.1016/S0020-0255(00)00060-8

Information Sciences

Volume 129, Issues 1–4, November 2000, Pages 141-154

https://doi.org/10.1016/S0020-0255(00)00060-8 Get rights and content

Abstract

A finite state automaton (FA) is a typical and basic machine for computer sciences and how to make the transition tables fast and compact is very important for the whole performance of modules by using the machine. Although a two-dimensional array and a list form are well known as for their representation, which have contrasted features for retrieval time and space efficiency. A.V. Aho and J.D. Ullman [Principles of Compiler Design, Addison-Wesley, Reading, MA, 1986, pp. 74–118] proposed a triple-array method for combining the compactness of the list with the fast retrieval of the array, but there is no discussion to make the structure compact without degrading the fast retrieval. This paper proposes a method of compressing the triple-array structures. Although the approach is not optimal, it does realize a reasonable reduction in size. From the simulation results, it is verified that the presented method can produce fast and compact structures.

Introduction

In many computer sciences, a finite state automaton (FA) is a well-known machine. Examples include automata for lexical and syntax analyzers (LR parse) of a compiler [1], [2], voice recognition [3], bibliographic search [4], [5], [6], spelling check [7], sequential circuits and so on.

Storing and retrieving the transition tables (i.e., for goto functions) of the FA efficiency is an important study because data retrieval is the most time-consuming part of many programs, and the use of a good method rather than a bad one often leads to a substantial increase in processing speed. In the implementation of the state transitions of the FA, it is important how to store and retrieve transitions, or arcs, efficiently, defined by the goto function.

Typical data structures for storing the goto functions are the matrix form and linear list form. The former stores the defined goto functions together with the undefined goto functions but the latter stores only defined goto functions. These data structures have desirable and undesirable features [1], [7], [8] with respect to space requirements and access time, respectively.

Aho and Ullman [1] introduced three one-dimensional arrays, called a triple array, combining the fast access of an array, which is the worst-case time complexity O(1), with the compactness of the list. The triple-array method is suitable when we wish to emphasize the time and space efficiencies of the reduced machine rather than the time spent in reduction of goto functions of the machine. We call the machine a static finite state machine. For example a lexical analyzer [1], [2] is a typical case, since it has never been modified by the user and is the only process that must examine the input one character at a time. There are many cases of the static machines in such applications as voice recognition, spelling check and parsing.

This paper proposes an improved method for storage and retrieval of transition tables. We first propose an algorithm for reducing machines with tree structure and extend the approach to general transition tables. From the simulation results, it is verified that the presented method can produce compact structures.

Section snippets

A finite state machine

We define a deterministic finite state machine (FA) M as follows: $M=(K,I,g,s_{I},F),$ where K and I are the finite set of states and a finite set of input symbols, respectively, g the gate function mapping from K×I to K, s_I the initial state in K and F⊆K is the finite set of final states.

Since the transition table of the finite state machine which machines a finite number of words consists of a tree structure, we define the following finite state machine.

Definition 1

Let indegree(s) be the number of transitions

Vector representation of holes

In the two-array structure, it is difficult to minimize the number of holes in the tables for any goto functions of the T-type finite state machine, but Algorithm A simplifies the problem by dividing the reducing process into certain subprocesses and minimizes the number of holes in each subprocess (Algorithm A is shown later). We introduce the data structure by bit vectors b₁b₂⋯b_i⋯b_n in which b_i=0 if the entry in the table $check$ is a hole; b_i=1 otherwise.

Definition 2

Let K^′ be a set of states s such that

Revised Algorithm A

A general finite state machine differs from the T-type finite state machine in that the former has a state s such that $indegree(s) ≧ 2$ . If the data structure of Section 3 is used for s, then the following undesirable feature arises.

Suppose that g(s^′,a^′)=s and g(s″,a″)=s for s^′, s^′ and s″ in K and for a^′ and a″ in I. Algorithm A defines two values j^′ and j″ for new(s) such that $j^{′} = base [new(s^{′})]+a$ $j″= base [new(s″)]+a$ Thus, we must merge the two values into one. Our approach is very convenient for this

Conclusion

This paper has described a method of improving the triple-array method for storage and retrieval of transition tables. Our technique can be used to reduce sparse matrices [9], [10] and DAWG [11], and the procedure MERGE( $check, vec (s)$ ) can also be used to construct hash functions for the table base, since the procedure is for the table base. Since the procedure is very efficient by virtue of the bit vectors, it is applicable to compiler bookkeeping without the static condition.

References (11)

A.V. Aho et al.
P.M. Lewis et al.
K. Kita, A study ion language modeling for speech recognition, Ph.D thesis, Waseda University,...
A.V. Aho et al.
Efficient string matching: an aid to bibliographic search
Commun. ACM
(1975)
J. Aoe, Computer Algorithms – String Pattern Matching Strategies, IEEE Computer Society Press, Silver Spring, MD,...

There are more references available in the full text version of this article.

Cited by (2)

An efficient e-mail filtering using time priority measurement
2004, Information Sciences
Although E-mail systems are one of the most useful communication tools for business, education, etc., missing important E-mail messages become a very serious problem. It is very useful filtering supports for users to pick up important messages or to neglect unnecessary messages. This paper presents a method of determining the time priority for E-mail messages. Multi-attribute rules are defined to detect complex time expressions and a set pattern-matching machine is proposed. It enables us to protect missing messages with important time information because the presented method can classify and rank them according to time priority measurement automatically. From the simulation results of determining time priority, the presented pattern-matching method is about 4 times faster than the traditional string pattern-matching method. From the results of filtering 5172 sentences, it is verified that precision and recall of the presented method becomes 95% and 96%, respectively. From the experimental results of determining 10 highest messages among 100 E-mail, it is verified that filtering time by the proposed measurement is from 9.7 to 16.6 faster than a non-filtering method.
A simplification algorithm of regular grammar production
2009, 2009 1st International Conference on Information Science and Engineering, ICISE 2009

View full text