Synthesis of High-Speed Finite State Machines in FPGAs by State Splitting

Salauyou, Valery

doi:10.1007/978-3-319-45378-1_64

Valery Salauyou¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9842))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

2066 Accesses
6 Citations

Abstract

A synthesis method of high-speed finite state machines (FSMs) in field programmable gate arrays (FPGAs) based on LUT (Look Up Table) by internal state splitting is offered. The method can be easily included in designing the flow of digital systems in FPGA. Estimations of the number of LUT levels are presented for an implementation of FSM transition functions in the case of sequential and parallel decomposition. Split algorithms of FSM internal states for the synthesis of high-speed FSMs are described. The experimental results showed a high efficiency of the offered method. FSM performance increases by 1.52 times on occasion. In conclusion, the experimental results were considered, and prospective directions for designing high-speed FSMs are specified.

You have full access to this open access chapter, Download conference paper PDF

The Synthesis Method of High-Speed Finite State Machines in FPGA

High-Speed Finite State Machine Design by State Splitting

Combined State Splitting and Merging for Implementation of Fast Finite State Machines in FPGA

Keywords

1 Introduction

Large-size functional blocks and nodes of a digital system and also the digital system itself, as a rule, include a control device or a controller. The speed of a digital system and functional blocks depends directly on the speed of their control devices. The mathematical model for the majority of control devices and controllers is a finite state machine (FSM). Because of this, the synthesis methods of high-speed FSMs are necessary for designing high-performance digital systems. Note that an implementation cost can be ignored in the synthesis of high-speed FSMs, because an FSM area takes a small part compared with other system components (for example, memory or transceivers).

Now, programmable logic devices (PLDs) are widely used for designing digital systems. Two types of PLD architectures are widely used: on the basis of two programmed matrixes (AND and OR), and on the basis of functional generators, an LUT (Look Up Table). The first PLD type is called Complex Programmable Logic Devices (CPLDs), and the second PLD type is called Field Programmable Gate Arrays (FPGAs). It is possible to represent an FPGA structure as a great quantity of LUTs united by interconnections. Every LUT allows realizing any Boolean function from a small number of arguments (as a rule, from 4 to 6). The methods of FSM synthesis on CPLD have been considered in [1].

Many authors considered the synthesis problem of high-speed FSMs on PLD. Their methods were characterized by a large variety of approaches to deciding on a given task. In [2], a technique for improving the performance of a synchronous circuit configured as an FPGA-based look-up table without changing the initial circuit configuration is presented. Only the register location is altered. This improves clock speed and data throughput at the expense of latency. In [3], the methods and tools for state encoding and combinational synthesis of sequential circuits based on new criteria of information flow optimization are considered. In [4], the timing optimization technique for a complex FSM that consists of not only random logic but also data operators is proposed. The technique, based on the concept of a catalyst, adds a functionally redundant block (which includes a piece of combinational logic and several other registers) to the circuits under consideration so that the timing critical paths are divided into stages. In [5, 6], the styles of FSMs description in VHDL language and known methods of state assignment for the implementation of FSMs are researched. In [7], evolutionary methods are applied to the synthesis of FSMs. At the first stage, the task of state assignment by means of genetic algorithms is resolved. Then evolutionary algorithms are applied to the minimization of chip area and time delay of FSM output signals. In [8], the task of state assignment and optimization of the combinational circuit at implementation of high-speed FSMs in CPLD is considered. In [9], a novel architecture that is specifically optimized for implementing reconfigurable FSMs, Transition-based Reconfigurable FSM (TR-FSM), is presented. The architecture shows a considerable reduction in area, delay, and power consumption compared to FPGA architectures. In [10], a new model of the automatic machine named the virtual finite state machine (Finite Virtual State Machine - FVSM) is offered. For implementation of the FVSM, architecture based on storage and a technique of FVSM generation from traditional FSMs is offered. FVSM implemented on new architecture have an advantage on high-speed performance compared with traditional implementation of FSMs on storage RAM. In [11], an implementation of FSMs in FPGA with the use of integral units of storage ROM is considered. Two pieces of FSMs architecture with multiplexers on inputs of ROM blocks which allow reducing the area and increasing high-speed FSM performance are offered. In [12], the reduction task of arguments of transition functions by state splitting is considered; this allows reducing an area and time delay in the implementation of FSMs on FPGA.

This paper also uses splitting of FSM states, but the purpose of splitting is an increase of FSMs performance in LUT-based FPGA. Splitting of FSM states belongs to operations of equivalent conversions of an FSM and does not change the algorithm of its functioning. During splitting of FSM states, the machine type (Mealy or Moore) is saved, the general structure of the FSM does not change, and embedded memory blocks of FPGAs are not used. In the course of state splitting, the hierarchy of state names is saved, which simplifies the analysis and debugging of the project. Because of this, the offered synthesis method of high-speed FSMs in FPGA is aimed at practical usage and can be easily included in the general flow of digital system design.

This paper is organized as follows. Section 2 describes estimations of the number of LUT levels in the implementation of FSM transition functions in the case of sequential and parallel decomposition. Section 3 considers the synthesis method of high-speed FSMs, which includes two algorithms: a general algorithm and an algorithm for the decomposition of the concrete state. A detailed example shows the method. The experimental results are reported in Sect. 4. The paper concludes with a summary in Sect. 5.

2 Estimations for the Number of LUT Levels for Transition Functions

Let A = {a₁, …, a_M} be the set of internal states, X = {x₁, …, x_L} be the set of input variables, Y = {y₁, …, y_N} the set of output variables, and D = {d₁, …, d_R} the set of transition functions of an FSM.

A one-hot state assignment is traditionally used for the synthesis of high-speed FSMs in FPGAs. Thus, each internal state a_i (a_i ∈ A) corresponds to a separate flip-flop of FSM’s memory. A setting of this flip-flop in 1 signifies that the FSM is in the given state. The data input of each flip-flop is controlled by the transition function d_i, d_i ∈ D, i.e. any internal state a_i (a_i ∈ A) of the FSM corresponds with its own transition function $ d_{i} ,i = \overline{1,M} $.

Let X(a_m,a_i) be the set of FSM input variables, whose values initiate the transition from state a_m to state a_i (a_m, a_i ∈ A). To implement some transition from state a_m to state a_i, it is necessary to check the value of the flip-flop output for the active state a_m (one bit) and the input variable values of the X(a_m,a_i) set, which initiates the given transition. To implement the transition function d_i, it is necessary to check the values of the flip-flop outputs for all states, such that transitions from which lead to state a_i, i.e. |B(a_i)| values, where B(a_i) is the set of states from which transitions terminate in state a_i, where |A| is the cardinality of set A. Besides, it is necessary to check the values of all input variables, which initiate transitions to state a_i, i.e. |X(a_i)| values, where X(a_i) is the set of input variables, whose values initiate transitions to state a_i, $ X(a_{i} ) = \bigcup\limits_{{a_{m} \in B(a_{i} )}} {X(a_{m} ,a_{i} )} $.

Let r_i be a rank of the transition function d_i, where

$$ r_{i} = \left| {B(a_{i} )} \right| + \left| {X(a_{i} )} \right|. $$

(1)

Let n be the number of inputs of LUTs. If the rank r_i for transition function $ d_{i} (i = \overline{1,M} ) $ exceeds n, there is a necessity to decompose the transition function d_i and its implementation on several LUTs.

Note that by splitting internal states it is impossible to lower the rank of the transition functions below the value

$$ r^{*} = \hbox{max} (|X(a_{m} ,a_{s} )|) + 1,m,s = \overline{1,M} . $$

(2)

In this method, the value r* is used as an upper boundary of the ranks of the transition functions in splitting the FSM states.

It is well-known that there are two basic approaches to the decomposition of Boolean functions: sequential and parallel. In the case of sequential decomposition, all the LUTs are sequentially connected in a chain (Fig. 1).

The n arguments of function d_i arrive on inputs of the first LUT, and the (n − 1) arguments arrive on inputs of all remaining LUTs. So the number $ l_{i}^{s} $ of the LUT’s levels (in the case a sequential decomposition of the transition function d_i having the rank r_i) is defined by the expression:

$$ l_{i}^{s} = \text{int} \left( {\frac{{r_{i} - n}}{n - 1}} \right) + 1, $$

(3)

where int(A) is the least integer number more or equal to A.

In the case of parallel decomposition, the LUTs incorporate in the form of a hierarchical tree structure (Fig. 2).

The values of the function arguments arrive on LUTs inputs of the first level, and the values of the intermediate functions arrive on LUTs inputs of all next levels. So the number of LUT’s levels (in the case parallel decomposition the transition function d_i having the rank r_i) is defined by the following expression:

$$ l_{i}^{p} = \text{int} \left( {\log_{n} r_{i} } \right). $$

(4)

It is difficult to predict what type of decomposition (sequential or parallel) is used by a concrete synthesizer. The preliminary research showed that, for example, the Quartus II design tool from Altera simultaneously uses both sequential and parallel decomposition. The number l_i levels of LUTs in the implementation on FPGA transition function d_i with the rank r_i can be between values $ l_{i}^{s} $ and $ l_{i}^{p} $, $ i = \overline{1,M} $.

Let k be an integer coefficient (k ∈ [0,10]) that allows adapting the offered algorithm in defining the number of LUT’s levels for the specific synthesizer. In this case the number l_i of LUT’s levels for the implementation of the transition function d_i having the rank r_i will be defined by following expression:

$$ l_{i} = \text{int} \left( {\frac{10 - k}{10}l_{i}^{p} + \frac{k}{10}l_{i}^{s} } \right). $$

(5)

The specific value of coefficient k depends on the architecture of the FPGA and the used synthesizer.

The following problem is the answer to the question: when is it necessary to stop splitting the FSM states? The matter is that in splitting state $ a_{i} (i = \overline{1,M} ) $, except for the increase of the number M of the FSM states, the number of transitions in the states of set A(a_i) is also increased, where A(a_i) is the set of states in which the transitions from state a_i terminate. When splitting state a_i, the cardinalities of sets B(a_m) (a_m ∈ A(a_i)) are increased for the states of set A(a_i). Therefore, according to (1) for the states of set A(a_i) the ranks of the transition functions grow, which can lead to an increase of the values and $ l_{i}^{s} $, $ l_{i}^{p} $, and l_i.

In this algorithm, the process of state splitting is finished, when the following condition is met:

$$ l_{\hbox{max} } \le \text{int} \left( {l_{mid} } \right) , $$

(6)

where l_max is the number of LUT levels, which is necessary for the implementation of the most “bad” function having the maximum rank; l_mid is the arithmetic mean value of the number of LUT levels for all transition functions. Note that in the process of splitting the FSM internal states, the value l_mid will increase and the value l_max will decrease, therefore the algorithm execution always comes to an end.

3 Method for High-Speed FSM Synthesis

According to the above discussion, the algorithm of state splitting for high-speed FSM synthesis is described as follows.

Further synthesis of the FSM is performed using traditional techniques, for example, automatically by means of using a design tool synthesizer. For this purpose, it is enough to describe the FSM received after splitting internal states in one of the design languages (Verilog or VHDL). The value of coefficient k (step 1 of Algorithm 1) is defined empirically by means of synthesis of the test examples in the used design tool.

For splitting some a_i state, $ i = \overline{1,M} $, which is executed in step 6 of Algorithm 1, Boolean matrix W is constructed as follows. Let C(a_i) be the set of transitions to state a_i. Rows of matrix W correspond to the elements of set C(a_i). Columns of matrix W are divided on two parts according to types of arguments of transition function d_i. The first part of matrix W columns correspond to set B(a_i) of FSM states, the transitions from which terminate in state a_i, and the second part of matrix W columns correspond to set X(a_i) of input variables, whose values initiate the transitions in state a_i. A one is put at the intersection of row t (t = $ \overline{1,T} $, T = |C(a_i)|) and column j of the first part of matrix W if the transition c_t (c_t ∈ C(a_i)) is executed from state a_j (a_j ∈ B(a_i)). A one is put at the intersection of row t and column j of the second part of matrix W if input variable x_j (x_j ∈ X(a_i)) accepts a significant value (0 or 1) on transition c_t (c_t ∈ C(a_i)). Now the task is reduced to a partition of matrix W on a minimum number H of row minors $ W_{ 1} , \ldots ,W_{H} $ so that the number of columns, which contain ones in each minor W_h (h = $ \overline{1,H} $), do not exceed value r* defined according to (2). The rows of each minor W_h will define transitions in state a_{i_h} (h = $ \overline{1,H} $).

Let w_t be some row of matrix W. For finding the row partition of matrix W on a minimum number H of row minors $ W_{ 1} , \ldots ,W_{H} $, the following algorithm can be used.

We show the operation of the offered synthesis method in the example. It is necessary to synthesize the high-speed FSM whose state diagram is shown in Fig. 3.

This FSM represents the machine Moore, which has 6 states $ a_{ 1} , \ldots ,a_{ 6} $, 10 input variables $ x_{ 1} , \ldots ,x_{ 10} $, and one output variable y. The transitions from states a₃, a₄, and a₅ are unconditional, therefore the logical value 1 is written on these transitions as a transition condition. The values of sets B(a_i) and X(a_i), and also ranks r_i of the transition functions for the initial FSM are presented in Table 1, where Ø is an empty set. Since for this example we have max(|X(a_m,a_s)|) = 5, then (according to (2)) the value r* = 6. It is necessary to construct the FSM on FPGA with 6-input LUT, i.e. we have n = 6.

Table 1. Values of B(a_i), X(a_i), r_i, $ l_{i}^{s} $, and $ l_{i}^{p} $ for the initial FSM

Full size table

According to (3) and (4), the values $ l_{i}^{s} $ and $ l_{i}^{p} $ are defined for each state (they are presented in the appropriate columns of Table 1). We do not know how the compiler performs a decomposition of Boolean functions, therefore we assume the sequential decomposition (a worst variant) and the value of coefficient k in expression (5) is equal to 10, i.e. we have k = 10. As a result, the number of LUT levels (which are necessary for the implementation of each transition function) is defined by the value $ l_{i} = l_{i}^{s} $. Thus, for our example we have int(l_mid) = int(8/6) = 2. In other words, splitting FSM internal states stops as soon as each transition function can be implemented in two levels of LUTs.

For this example, we have $ l_{max} = l_{2}^{s} = { 3} $, i.e. the condition (9) does not meet for state a₂, since $ l_{max} = l_{2}^{s} = { 3 } > {\text{ int}}\left( {l_{mid} } \right) \, = { 2} $. For this reason, state a₂ is split by means of Algorithm 2. Matrix W is constructed for splitting state a₂ (Fig. 4).

Matrix W has two rows. Row w₁ corresponds to the transition from state a₁ to state a₂, and row w₂ corresponds to the transition from state a₆ to state a₂. The execution of Algorithm 2 leads to a partition of rows of matrix W into two subsets: W₁ = {w₁} and W₂ = {w₂}. So, state a₂ is split into two states a_{2_1} and a_{2_2}, as shown in Fig. 5.

The new values of B(a_i), X(a_i), r_i, $ l_{i}^{s} $, and $ l_{i}^{p} $ are presented in Table 2. Now we have l_max = l_mid = 1 and (according to (6)) running of Algorithm 1 is completed.

Table 2. Values of B(a_i), X(a_i), r_i, $ l_{i}^{s} $, and $ l_{i}^{p} $ after splitting state a₂

Full size table

Thus, for the given FSM by splitting state a₂ we reduced the number of LUT levels from 3 to 1, in the case of sequential decomposition, and from 2 to 1, in the case of parallel decomposition.

4 Experimental Results

The efficiency of the offered synthesis method was checked in the implementation of the initial FSM (Fig. 1) and the FSM after splitting state a₂ (Fig. 2) on FPGAs from Altera by means of the design tool Quartus II version 15.0. The main optimization criterion had been selected as the parameter «speed». The «one-hot» method of state assignment was selected for the initial FSM, and the «user» method of state assignment was selected for the FSM after synthesis (the state codes are defined from the FSM description).

Table 3 presents the results of the experimental research of the offered method for various FPGA families, where nLUT₁ and nLUT₂ are the number of LUTs used in the implementation of the initial and the synthesized FSM, respectively; F1 and F2 are the clock frequency (in MHz) for the initial and the synthesized FSM, respectively; F1/F2 is the relation of the appropriate parameters.

Table 3. Results of the experimental researches

Full size table

Analysis of Table 3 shows that the application of the offered method increased the performance of the FSM for 5 FPGA families from 7. Thus, for the family MAX II performance was increased by 1.52 times, and for the family Cyclone V performance increased by 1.35 times. In addition, the number of used LUTs decreased for the following families: Arria II GX, MAX V, and MAX II.

5 Conclusions

The presented results of the experimental research showed the following. Despite the fact that in the considered example the rank of transition function was reduced from 12 to 6, which allowed to reduce the number of LUT levels from 3 to 1 in the case of sequential decomposition, and from 2 to 1 in the case of parallel decomposition; however, the performance of the FSM did not increase for all FPGA families. This is a sign of the complexity of the synthesis task of high-speed FSMs. FSM performance depends not only on the results of logical synthesis, but also on the results of placing and routing. The reduction of the number of used LUTs for some FPGA families (as a result of the application of the offered method) can be accounted simply: with the reduction of the number of LUT levels, the LUT amount also decreases.

The present study was supported by a grant S/WI/1/2013 from Bialystok University of Technology and founded from the resources for research by Ministry of Science and Higher Education.

References

Salauyou, V.V., Klimowicz, A.S.: Logic Design of Digital Systems on Programmable Logic Devices. Hot Line – Telecom, Moscow (2008). (in Russian)
Google Scholar
Miyazaki, N., Nakada, H., Tsutsui, A., Yamada, K., Ohta, N.: Performance improvement technique for synchronous circuits realized as LUT-based FPGA’s. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 3, 455–459 (1995)
Article Google Scholar
Jozwiak, L., Slusarczyk, A., Chojnacki, A.: Fast and compact sequential circuits through the information-driven circuit synthesis. In: Euromicro Symposium on Digital Systems Design, pp. 46–53. IEEE Press, Warsaw (2001)
Google Scholar
Huang, S.-Y.: On speeding up extended finite state machines using catalyst circuitry. In: Asia and South Pacific Design Automation Conference (ASAP-DAC), Yokohama, pp. 583–588, January 2001
Google Scholar
Kuusilinna, K., Lahtinen, V., Hamalainen, T., Saarinen, J.: Finite state machine encoding for VHDL synthesis. IEEE Proc. Comput. Digit. Tech. 1, 23–30 (2001)
Article Google Scholar
Rafla, N.I., Davis, B.: A study of finite state machine coding styles for implementation in FPGAs. In: 49th IEEE International Midwest Symposium on Circuits and Systems, San Juan, USA, pp. 337–341 (2006)
Google Scholar
Nedjah, N., Mourelle, L.: Evolutionary synthesis of synchronous finite state machines. In: International Conference on Computer Engineering and Systems, Cairo, Egypt, pp. 19–24 (2006)
Google Scholar
Czerwiński, R., Kania, D.: Synthesis method of high speed finite state machines. Bull. Pol. Acad. Sci. Tech. Sci. 4, 635–644 (2010)
Google Scholar
Glaser, J., Damm, M., Haase, J., Grimm, C.: TR-FSM: Transition-based reconfigurable finite state machine. ACM Trans. Reconfig. Technol. Syst. (TRETS) 3, 23:1–23:14 (2011)
Google Scholar
Senhadji-Navarro, R., Garcia-Vargas, I.: Finite virtual state machines. IEICE Trans. Inf. Syst. 10, 2544–2547 (2012)
Article Google Scholar
Garcia-Vargas, I., Senhadji-Navarro, R.: Finite state machines with input multiplexing: a performance study. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 5, 867–871 (2015)
Article Google Scholar
Solov’ev, V.V.: Splitting the internal states in order to reduce the number of arguments in functions of finite automata. J. Comput. Syst. Sci. Int. 5, 777–783 (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland
Valery Salauyou

Authors

Valery Salauyou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valery Salauyou .

Editor information

Editors and Affiliations

Bialystok University of Technology , Bialystok, Poland
Khalid Saeed
University of Bialystok , Vilnius, Lithuania
Władysław Homenda

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salauyou, V. (2016). Synthesis of High-Speed Finite State Machines in FPGAs by State Splitting. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science(), vol 9842. Springer, Cham. https://doi.org/10.1007/978-3-319-45378-1_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-45378-1_64
Published: 09 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45377-4
Online ISBN: 978-3-319-45378-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Synthesis of High-Speed Finite State Machines in FPGAs by State Splitting

Abstract

Similar content being viewed by others

The Synthesis Method of High-Speed Finite State Machines in FPGA

High-Speed Finite State Machine Design by State Splitting

Combined State Splitting and Merging for Implementation of Fast Finite State Machines in FPGA

Keywords

1 Introduction

2 Estimations for the Number of LUT Levels for Transition Functions

3 Method for High-Speed FSM Synthesis

4 Experimental Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Synthesis of High-Speed Finite State Machines in FPGAs by State Splitting

Abstract

Similar content being viewed by others

The Synthesis Method of High-Speed Finite State Machines in FPGA

High-Speed Finite State Machine Design by State Splitting

Combined State Splitting and Merging for Implementation of Fast Finite State Machines in FPGA

Keywords

1 Introduction

2 Estimations for the Number of LUT Levels for Transition Functions

3 Method for High-Speed FSM Synthesis

4 Experimental Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation