Efficient control generation for mapping nested loop programs onto processor arrays☆
Section snippets
Introduction and related work
In the last decade, there has been a dramatic growth in research and development of massively parallel processor arrays both in academia and industry. Examples of state-of-the-art reconfigurable processor array architectures are RAW [17], PACT-XPP64A [11] and WPPA [9]. Processor array architectures provide an optimal platform for the parallel execution of number crunching loop programs from fields of digital signal processing, image processing, linear algebra, etc. However, due to a lack of
Definitions, notations, and transformations
In this section we first give a brief overview of our existing mapping methodology PARO (see design flow of our approach in Fig. 1) based on the polytope model [10] for mapping of loop nests onto massively parallel architectures. The starting point is an algorithmic description as a set of recurrence equations called piecewise linear algorithm (see Definition 2.1). Definition 2.1 PLA A piecewise linear algorithm consists of a set of N quantified equations S1[I], …, SN[I], where each equation Si[I] is of the form
Control generation
Partitioning not only increases the PLA code size but also introduces a more complex control flow in the program. The iteration dependent if-conditionals occurring in a given PLA have to be replaced by control variables for efficient parallelization. Therefore, a methodology for control generation is needed that specifies the control units and signals of the processor array.
Conclusions and future work
The processor array specification is interpreted from the PLA after control generation. In [3], the authors validated their methodology with a case study showing up to 90% curtailment in control path area cost as compared to earlier methodologies [1], [6]. This huge reduction is attributed to the fact that earlier counters local to every PE updated the iteration variables leading to scaling of cost proportional to the number of PEs. Our scheduling methodology for partitioning techniques enables
References (19)
- et al.
Constructing and exploiting linear schedules with prescribed parallelism
ACM Transactions on Design Automation of Electronic Systems
(2002) - Steven Derrien, Tanguy Risset, Interfacing Compiled FPGA Programs: The MMAlpha Approach, in: Proceedings of the...
- Hritam Dutta, Frank Hannig, Jürgen Teich, Controller Synthesis for Mapping Partitioned Programs on Array Architectures,...
- Hritam Dutta, Frank Hannig, Jürgen Teich, Hierarchical Partitioning for Piecewise Linear Algorithms, in: Proceedings of...
- et al.
Hierarchical algorithm partitioning at system level for an improved utilization of memory structures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
(1999) - Anne-Claire Guillou, Patrice Quinton, Tanguy Risset, Hardware Synthesis for Multi-Dimensional Time, in: Proceedings of...
- Frank Hannig, Hritam Dutta, Jürgen Teich, Regular Mapping for Coarse-grained Reconfigurable Architectures, in:...
- Frank Hannig, Jürgen Teich, Design Space Exploration for Massively Parallel Processor Arrays, in: Victor Malyshkin,...
- Dmitrij Kissler, Frank Hannig, Kupriyanov Alexey, Jürgen Teich, Hardware Cost Analysis for Weakly Programmable...
Cited by (3)
A direct method for optimal VLSI realization of deeply nested n-D loop problems
2013, Microprocessors and MicrosystemsCitation Excerpt :A VLSI hardware architecture in a template form was used to implement a parsing algorithm incorporated into an automated synthesis tool. The tool generates a HDL synthesisable source code for the given specifications of a control flow graph that implements a global controller to obtain an improved control path leading to a less complex control flow [24,25]. The generated source is simulated for validation, synthesised and tested on a Xilinx field programmable gate array (FPGA) board [25].
A holistic approach for tightly coupled reconfigurable parallel processors
2009, Microprocessors and MicrosystemsPARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications
2008, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- ☆
Supported in part by the German Science Foundation (DFG) in project under contract TE 163/13-1.