Communication storage optimization for static dataflow with access patterns under periodic scheduling and throughput constraint

https://doi.org/10.1016/j.compeleceng.2014.05.002Get rights and content

Abstract

We address a recently introduced static dataflow model: the Static Dataflow with Access Patterns (SDF-AP) model. For this model we present (1) a generalization of an existing regular periodic scheduling scheme to regular 1-periodic scheduling for flexibility to achieve a smaller schedule period and additional room for optimization on communication storage; (2) a method based on Integer Linear Programming (ILP) to minimize communication buffers under periodic scheduling and user-specified throughput constraints. Experimental results on a set of test cases show that buffer sizes using this approach can be reduced dramatically when compared to the traditional SDF models. The optimal sizing result may serve as an important criterion to evaluate and fine-tune any heuristics-based buffer sizing approach for the SDF-AP model of computation.

Introduction

In the Static (or Synchronous) Dataflow (SDF) model of computation [1], data are passed in the form of tokens among actors. Actors perform computation on tokens and in the process they may create new tokens. This model of computation is typically represented as a graph, in which nodes represent actors and edges denote inter-actor communication of data. Edges are also commonly called “channels” connecting actor “ports”. SDF models require that the number of tokens consumed and produced in a single firing (also known as consumption and production rates, respectively) be specified at design time. For instance, a 2:1 decimation filter actor with one input port and one output port would consume two tokens from its input port when firing and produce one result to its output port. When all the rates in an SDF graph are one, it is homogeneous; otherwise it is multi-rate.

Given that rates are specified statically, SDF models can be analyzed to verify properties such as consistency of production and consumption rates [1] and to ensure that enough communication buffer space is allocated for correct execution (e.g. deadlock freedom for self-timed execution). SDF is inherently an untimed model of computation. It is common, however, to associate execution times with actors to enable static timing analysis. SDF models have been widely used for specifying embedded real-time applications in the domains of multimedia and digital signal processing.

Systems specified using SDF models can be implemented as either software [2], [3] or hardware [4], [5]. Schedule and throughput vary with the amount of memory allocated for inter-actor communication. Analysis techniques for these models generally overestimate communication storage requirements, which is detrimental for embedded real-time systems where memory is at a premium. The Static Dataflow with Access Patterns (SDF-AP) model of computation was recently introduced to address this deficiency [6]. SDF-AP models specify in addition to token rates the specific cycles (relative to the start of the firing of an actor instance) in which individual tokens are produced or consumed. The potential of saving communication storage using SDF-AP under the same throughput constraint as that of the corresponding SDF is also demonstrated in [6].

SDF models can be executed in a self-timed manner once the required number of tokens are available on its input channels and enough vacant space exists on its output channels to hold the tokens it will produce. FIFO (First In, First Out) buffers are typically used for inter-actor communication. An actor reads its inputs from the respective FIFOs in the order in which the input tokens were produced and outputs its tokens in order to its production FIFOs. Actors can use handshaking to implement the self-timing: an actor stalls until necessary resources (both input tokens and output space) are available, and its neighbors inform it when necessary inputs have been produced or output space has been freed up. Self-timed implementations are common, but it is also possible to statically schedule SDF models.

Whether self-timed or statically scheduled, actors in an SDF model share one characteristic – they do not execute until all necessary inputs are available and all required output space is free. Because it takes time to create tokens and fill output FIFOs, actors are guaranteed to be stalled some portion of the time while one of those two processes is going on. One way to speed up execution (and reduce memory requirements for FIFOs) is to exploit those gaps by pipelining actors’ computation by starting work before all input tokens are available and all required output space is free. This is the idea behind SDF-AP. By starting computation when some inputs are available, rather than waiting for all, and by producing tokens when some output space is available, rather than waiting for all required space, actors can overlap computation and memory use more effectively. However, they must also be careful not to consume tokens that are not yet ready or to overwrite previous outputs before they have been consumed. This implies fine-grained scheduling techniques are needed for SDF-AP.

While both exact and approximate buffer size optimization techniques have been studied for SDF models, no detailed buffer sizing technique for SDF-AP models has been reported in the literature. This paper addresses the following question: “How much extra benefit in terms of schedule and buffer size can be gained by exploiting access pattern information?”.

This paper makes two contributions: (1) it demonstrates that access pattern can be used to enhance scheduling with existing fine-grained regular periodic scheduling techniques and quantifies the benefits of the additional information; (2) it proposes a new scheduling scheme – regular 1-periodic scheduling – to overcome some of the restrictions inherent in the regular periodic scheduling scheme. The paper presents detailed derivation of scheduling precedence constraints and buffer sizing constraints that guarantee a valid execution. It formulates communication buffer sizing for applications specified in SDF-AP as an optimization problem based on Integer Linear Programming (ILP) [7]. Finally experiments on a set of benchmark applications are presented to illustrate how much communication storage reduction can be obtained when leveraging access patterns. Our ILP-based exact approach can always find the optimal communication storage under a scheduling policy but it is obviously computationally challenging. We believe that fast heuristic buffer sizing techniques for SDF-AP models can be derived and their results can be compared with the exact solution of our approach to evaluate their power and quality.

The rest of the paper is organized as follows. Section 2 surveys existing scheduling and buffer sizing techniques for SDF and SDF-like graphs. Then Section 3 reviews the SDF-AP model, followed by a discussion on throughput and communication storage in Section 4. Next, Section 5 discusses scheduling constraints and buffer size requirements, and presents two periodic scheduling schemes. Details of our exact approach based on ILP to optimize communication storage under throughput constraint for SDF-AP models are presented in Section 6, followed by experimental results in Section 7. Finally Section 8 concludes the paper with conclusions and future work.

Section snippets

Background

Buffer sizing and throughput analysis for SDF and SDF-like models has been extensively researched. The system throughput is known to be determined by the implemented schedule, which can be influenced by the amount of communication storage available.

Lee and Messerschmitt [1] introduce Synchronous Dataflow to specify digital signal processing applications for concurrent implementation on parallel hardware. In an SDF model, the number of data samples/tokens produced or consumed on each input and

Preliminaries

An SDF graph is a directed graph, where the set of nodes represents computational elements (actors) and the set of edges captures the communication channels among the actors. An actor communicates with its neighboring actors through channels that connect their ports. Port p can be either input or output, but not both. Let ao(p) be the actor that port p belongs to. For each port, a token rate (production rate, pr(p), for output; consumption rate, cr(p), for input) is specified. An SDF model

Throughput

Throughput (denoted by δ) is a key metric for real-time digital signal processing systems. In practice, it is defined as the number of tokens produced in unit time on a channel or output port. The throughput of a dataflow model essentially describes how often tokens are produced by some actor, which is related to how often the actor executes (with a difference of a factor of port production rate). Looking at different actors in the model may still give different actor execution rates due to

Scheduling

Let σ be a schedule. Specifically, when using a single number to index the actor instances, σ(n,u) represents the start time (scheduling offset) of instance n of actor u,σn,u(k) denotes the schedule time of the kth,k{1,,et(u)}, cycle of execution of instance n of actor u. Notice that σ(n,u)=σ(n,u(1)). When n is decomposed into iteration index i and actor instance index j in one iteration, notations σ(i,j,u) and σi,j,u(k) represent the above respective notions. For simplicity, in the rest of

Buffer size optimization via ILP

As illustrated in Fig. 2, different throughput constraints may require different communication buffer sizes. For embedded real-time systems, it is important to find the smallest buffer sizes that achieve a specified throughput. In this section, we formulate the buffer size optimization for SDF-AP models under periodic scheduling and throughput constraint as an ILP problem. Recall that schedule period T can be computed based on specified throughput constraint using (16) as discussed in the

Experimental results

In this section, we present the results on benchmarks to show buffer size requirements under different periodic scheduling schemes and how the proposed optimization technique may improve communication storage requirement compared to traditional SDF buffer sizing techniques.

The set of benchmarks studied in this paper consists of 9 applications: H.263 encoder (video frames encoding based on H.263 video compression standard), H.263 decoder (video frame reconstruction), modem digital filter, MP3

Conclusions and future work

In this paper, we studied existing regular periodic scheduling for the SDF and SDF-AP models of computation. We proposed a new flexible periodic scheduling scheme called regular 1-periodic scheduling for the SDF and SDF-AP models. Compared to regular periodic scheduling, regular 1-periodic scheduling offers additional room for buffer size optimization and provides a smaller achievable schedule period.

We further demonstrated that the buffer size optimization problem for SDF-AP under throughput

Acknowledgment

The authors would like to thanks the five anonymous reviewers for their valuable revision comments on the manuscript.

Guoqiang Wang is Staff Software Engineer at National Instruments Corporation. He holds a Ph.D. degree in Electrical Engineering and Computer Sciences from the University of California, Berkeley and a bachelor’s degree in Mechanical Engineering from Xi’an Jiaotong University. His research interests are in modeling and optimizing distributed, real-time, and embedded systems.

References (23)

  • E.A. Lee et al.

    Synchronous data flow

    Proc IEEE

    (1987)
  • S. Stuijk et al.

    Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs

    IEEE Trans Comput

    (2008)
  • Wiggers M, Bekooij M, Smit G. Buffer capacity computation for throughput constrained streaming applications with...
  • Kee H, Bhattacharyya SS, Kornerup J. Efficient static buffering to guarantee throughput-optimal FPGA implementation of...
  • Williamson M, Lee E. Synthesis of parallel hardware implementations from synchronous dataflow graph specifications. In:...
  • Tripakis S, Andrade H, Ghosal A, Limaye R, Ravindran K, Wang G, et al. Correct and non-defensive glue design using...
  • T.H. Cormen et al.

    Introduction to algorithms

    (2001)
  • E.A. Lee et al.

    Static scheduling of synchronous data flow programs for digital signal processing

    IEEE Trans Comput

    (1987)
  • Bilsen G, Engels M, Lauwereins R, Peperstraete J. Cyclo-static data flow. In: IEEE int conf on acoustics, speech, and...
  • Goddard S, Jeffay K. The synthesis of real-time systems from processing graphs. In: Proc of the 5th IEEE int symp on...
  • M.H. Wiggers, M.J.G. Bekooij, G.J.M. Smit, Efficient Computation of Buffer Capacities for Cyclo-Static Dataflow Graphs,...
  • Cited by (8)

    • Scheduling directed acyclic graphs with optimal duplication strategy on homogeneous multiprocessor systems

      2020, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      Most available duplication-based studies focus on composing heuristics [1–3,6,7,25] or conservatively improving the performance of available heuristics [34]. The MILP has been widely studied for the scheduling problem [4,28,34,36,38,39]. Among them, only a few have studied the duplication-based scheduling [4,28,34].

    • Actors with stretchable access patterns

      2019, Integration
      Citation Excerpt :

      A 1 signals a clock cycle at which a token is consumed or produced. Compared with basic SDF approaches on real applications [22], it yields a reduction of buffer sizes and latency together with a possibly drastic increase of the throughput rate. Nevertheless, SDF-AP suffers from limitations inherited from SDF principles.

    • A Block Assembly Tool for Design Automation of FPGA Implementations

      2022, International Conference on Communication Technology Proceedings, ICCT
    • SdrLift: An Intermediate-Level Framework for Synthesis of Software-Defined Radio Accelerators

      2019, 2019 IEEE 10th International Conference on Mechanical and Intelligent Manufacturing Technologies, ICMIMT 2019
    • A solution to overcome some limitations of SDF based models

      2018, Proceedings of the IEEE International Conference on Industrial Technology
    View all citing articles on Scopus

    Guoqiang Wang is Staff Software Engineer at National Instruments Corporation. He holds a Ph.D. degree in Electrical Engineering and Computer Sciences from the University of California, Berkeley and a bachelor’s degree in Mechanical Engineering from Xi’an Jiaotong University. His research interests are in modeling and optimizing distributed, real-time, and embedded systems.

    Randy Allen is Chief Architect at National Instruments. He co-authored the book ”Optimizing Compilers for Modern Architectures”, was the founder of Catalytic, Inc., and has served as a consultant on compiler optimization to IBM, Apple, Mentor Graphics, among others. He earned his A.B. in Chemistry from Harvard University (summa cum laude) and his PhD in Mathematical Sciences from Rice University.

    Hugo Andrade is Principal Architect in Software Marketing at National Instruments, focusing on advanced (CPS) applications based on the LabVIEW RIO Architecture. Prior to this he was founding technical manager of the NI Berkeley R&D site, working on advanced system level design tools. Hugo holds 59 patents. He earned BS ECE&CS and MS ECE degrees from the UT Austin.

    Alberto Sangiovanni-Vincentelli is the Buttner Chair of EECS at UC Berkeley; he is Cadence and Synopsys cofounder; a member of the UTC Technology Advisory Council, NAE, and received, among others, the Kaufman Award for pioneering contributions to EDA and IEEE/RSE Maxwell Medal for groundbreaking contributions with exceptional impact on development of electronics.

    Reviews processed and proposed for publication to Editor-in-Chief by Associate Editor Dr. Jian Li.

    View full text