High level synthesis of integrated heterogeneous pipelined processing elements for DSP applications

https://doi.org/10.1016/j.compeleceng.2004.11.005Get rights and content

Abstract

A technique for scheduling and processor allocation leading to the synthesis of integrated heterogeneous pipelined processing elements, implementing digital signal processing applications, is proposed. The proposed technique achieves efficient hardware implementations at the logic-level by minimizing the number of processing units used, without compromising the rate and delay optimality criteria.

The proposed algorithm is found to outperform algorithms resulting in homogeneous implementations, as it gives schedules with lower iteration periods, requires less hardware resources, and has lower time complexity at design time. In comparison with the already existing heterogeneous algorithms, the proposed algorithm produces schedules of lower time complexity and lower iteration period for some applications. The optimal performance of the proposed algorithm has been verified on several benchmarks.

Introduction

Digital signal processing (DSP), communications, and image processing tasks are computationally intensive, and thus demand systems with high computational power. The high computational power is necessary to accomplish the given tasks in real time. Because of the inherent parallelism in DSP tasks, a multiprocessor system is a natural choice for the implementation of such tasks [1].

The technique used to design digital systems at the high level is typically referred to as the high level synthesis (HLS). This technique is used to produce efficient hardware implementations for the given tasks. The HLS process starts with the specification of the algorithmic level behavior (input-output specifications) of a given digital system, and ends by finding the data path and control level structure realizing the given I/O specifications [2], [3], [4], [5], [6], [7], [8]. The algorithmic level behavior is usually represented by a data flow graph (DFG). An example of a DFG is depicted in Fig. 1. Through the HLS process, the objective functions and constraints must be met.

The process of high level synthesis consists of several procedures: system definition, scheduling, hardware allocation, and generation of the control system. The most critical procedures are the scheduling and the hardware allocation. The DFG of a given task undergoes scheduling process with the constraints that the resulting implementation is rate-optimal. The scheduling process can also consider the delay optimality, if needed. The hardware allocation process is then performed, such that, the resulting implementation is optimal in terms of hardware complexity.

In this paper, a technique for time scheduling and hardware allocation is proposed. The result of this technique is a processor assignment matrix (PAM) representing an implementation via a heterogeneous multiprocessing system.

The paper organization is as follows: A literature survey is presented in Section 2, Section 3 introduces the proposed time scheduling and hardware allocation techniques, Section 4 presents the time complexity analysis of the proposed technique, Section 5 considers some examples and provides a comparison with previous work. Finally, Section 5.2 concludes the paper by highlighting the main contribution of this investigation.

Section snippets

Previous work in high-level synthesis and motivations for proposed study

Synthesis tools that are used to generate valid implementations for digital systems are available since the 1970s. Many of them are targeted for the generation of data paths of general purpose digital systems.

Some of the synthesis tools are specialized in DSP applications. LAGER is a data path compiler that is specialized in DSP applications [9]. The algorithmic behavior specification is expressed in an Assembly-like language program. The main disadvantage of this synthesis tool is that the

The time scheduling process

The synthesis process is started by performing a time scheduling for all the nodes of the given data flow graph. The scheduling should be carried out with the constraints of minimizing the required number of PEs, and satisfying the criteria of rate- and delay- optimality. The proposed scheduling technique is characterized by the following:

  • It exploits intra- and inter-iteration precedence constraints.

  • It is a compile-time synchronous scheduler.

  • It is based on iterative/constructive procedure with

Time complexity analysis

The computational complexity of computing the iteration period (Step 1 of the scheduling algorithm) using the algorithm developed in [25] is O(D × M + D3), where D is the number of edges with nonzero delays. In the general case, the fastest algorithm for computing the shortest distance (hence for our purposes, the longest distance) between all pairs of a graph is Johnson’s algorithm [26] which has a time complexity of O(N2logN+N×M) (the time complexity of Step 2).

The computational complexity of the

Results

In this section, the benchmark fifth-order wave elliptic digital filter is synthesized using the technique presented in this paper. Starting from the DFG corresponding to the given algorithm, the process of synthesis is applied until a hardware implementation is obtained.

Next, a comparison between the proposed technique and other homogeneous and heterogeneous scheduling techniques in terms of the iteration period bound and number of PEs required for the implementation of some benchmark DSP

Conclusions

In this paper, a scheduling that supports heterogeneous structural pipelined processing elements has been presented. A system based on heterogeneous pipelined processing elements is found to be of lower cost than those of the homogeneous processing elements systems. The proposed scheme has resulted in lower iteration period bound by considering the node firing period rather than node computational delay as a lower bound on the minimum iteration period. Thus, the iteration bound of an acyclic

Ali Shatnawi received the B.Sc and M.Sc in electrical and computer engineering from the Jordan University of Science and Technology in 1989 and 1992, respectively; and the Ph.D degree in electrical and computer engineering from Concordia University, Canada, in 1996. He has been on the faculty of the Jordan University of Science and Technology since 1996. He is presently on a leave and working as the dean of Information Technology in the Hashemite University, Jordan. His present research covers

References (27)

  • D.J. DeFatta et al.

    Digital signal processing, a system design approach

    (1988)
  • M.C. McFarland et al.

    The high-level synthesis of digital systems

    Proc IEEE

    (1990)
  • P.G. Paulin et al.

    Force-directed scheduling for the behavioral synthesis of ASIC’s

    IEEE Trans Computer-Aided Design

    (1989)
  • L.J. Hafer et al.

    A formal method for the specification, analysis, and design of register-transfer level digital logic

    IEEE Trans Computer-Aided Design

    (1983)
  • M. Balakrishnan et al.

    Allocation of multiport memories in data path synthesis

    IEEE Trans Computer-Aided Design

    (1988)
  • C.-J. Tseng et al.

    Automated synthesis of data paths in digital systems

    IEEE Trans Computer-Aided Design

    (1986)
  • K.K. Parhi et al.

    Synthesis of control circuit in folded pipelined DSP architectures

    IEEE J Solid-State Circuits

    (1992)
  • C.-J. Tseng et al.

    Automated synthesis of data paths in digital systems

    IEEE Trans Computer-Aided Design

    (1986)
  • J.M. Rabaey et al.

    An integrated automated layout generation system for DSP circuits

    IEEE Trans Computer-Aided Design

    (1985)
  • H. Deman

    Cathedral II: A silicon compiler for digital signal processing

    IEEE Design Test

    (1986)
  • B.S. Haroun et al.

    Architectural synthesis for DSP silicon compiler

    IEEE Trans Computer-Aided Design

    (1990)
  • C.T. Hwang et al.

    A formal approach to the scheduling problem in high level synthesis

    IEEE Trans Computer-Aided Design

    (1991)
  • F.F. Yassa et al.

    A silicon compiler for digital signal processing: Methodology, implementation, and applications

    Proc IEEE

    (1987)
  • Cited by (1)

    Ali Shatnawi received the B.Sc and M.Sc in electrical and computer engineering from the Jordan University of Science and Technology in 1989 and 1992, respectively; and the Ph.D degree in electrical and computer engineering from Concordia University, Canada, in 1996. He has been on the faculty of the Jordan University of Science and Technology since 1996. He is presently on a leave and working as the dean of Information Technology in the Hashemite University, Jordan. His present research covers hardware design, high level synthesis of DSP applications, algorithms, and wireless networks.

    J. Ghanim received the B.S. degree in computer engineering in 1995 from AlBalqa Applied University, Amman, Jordan. He received the M.S. degree in computer engineering in 2000 from Jordan University of Science and Technology, Jordan. He is currently an instructor of computer engineering at the AlBalqa Applied University. He had short term positions at the computer engineering department and the computer center at Jordan University of Science and Technology, and at Samsung Service Center, Jordan. His research interests include high-level DSP synthesis, computer architecture and parallel processing.

    M.O. Ahmad received the B.Eng. degree from Sir George Williams University, Montreal, QC, Canada, and the Ph.D. degree from Concordia University, Montreal, both in electrical engineering. From 1978 to 1979, he was a member of the Faculty of the New York University College, Buffalo. In September 1979, he joined the Faculty of Concordia University, where he was an Assistant Professor of Computer Science. Subsequently, he joined the Department of Electrical and Computer Engineering of the same university, where presently he is a Professor and Chair of the department. His current research interests include the areas of multidimensional filter design, image and video signal processing, nonlinear signal processing, communication DSP, artificial neural networks, and VLSI circuits for signal processing.

    View full text