Performance of reconfigurable architectures for image-processing applications

https://doi.org/10.1016/S1383-7621(03)00065-1Get rights and content

Abstract

Reconfigurable architectures combine a programmable–visible interface and the high-level aspects of a computer’s design. The goal of this work is to explore the architectural behaviour of remote reconfigurable systems that are part of general-purpose computers. Our approach analyses various issues arising from the connection of processors with FPGA-based microarchitecture to an existing commodity microprocessor via a standard bus. The quantitative evaluation considers image-processing applications and shows that the maximum performance depends on the amount of data processed by the reconfigurable hardware. Taking images with 256 × 256 pixels, a moderate FPGA capacity of 1E+5 logic blocks provides two orders of magnitude of performance improvement over a Pentium III processor for most of our benchmarks. However, the performance benefits exhibited by reconfigurable architectures may be deeply influenced by some design parameters. This paper studies the impact of hardware capacity, reconfiguration time, memory organisation, and bus bandwidth on the performance achieved by FPGA-based systems. Those image-processing benchmarks that can exhibit high-performance improvement would require about 150 memory banks of 256 bytes each and a bus bandwidth as high as 30 GB/s. This quantitative approach can be applied to the design of high-performance reconfigurable coprocessors for multimedia applications.

Introduction

The abstract programming model of instruction-set architectures allows an application developer to concentrate on the algorithm in hand rather than on its hardware implementation. When reconfigurable systems are being used, the hardware structure can be accessible to the programmer, operative system or compiler [14], [29], [31]. This type of programming model can implement abstract operations and provides different levels of performance by allowing the software to try out several hardware designs. Reconfigurable architectures are more efficiently used than instruction-set architectures when the algorithm has high degree of regularity and exploits high levels of data parallelism. Additionally, in many applications with computationally intensive operations that are mapped to reconfigurable architectures, performance may be higher than is exhibited by instruction-set computers. One of the open research problems of configurable computing is understanding the limitation of configurable devices when competing with microprocessors [3], [15], [16].

Image Processing refers to an important class of data-parallel and computation-intensive applications that is becoming one of the dominants computing workloads [10], [11]. These applications operate on data to be presented visually or to extract image features for several tasks such as object detection, tracking, and classification. Image Processing is used to be considered as part of the multimedia domain. In general, multimedia applications are characterised by the streaming nature of data accesses and their large working-sets. These features do not make effective use of microprocessor hardware resources. Alternatively, configurable computing may provide more efficient hardware solutions [2], [25].

In spite of the large amount of attention given to multimedia workloads by investigators in the domain of configurable computing, there is little quantitative understanding of the performance of such applications on FPGA-based coprocessors. A major challenge for such studies is the enormous cost for developing applications using standardised hardware description languages [12], [15], [19].

The goal of this work is to explore the architectural behaviour of reconfigurable FPGA-based coprocessors that are part of general-purpose computers. Their performances are compared with that achieved by a successful Pentium III processor.

Further motivating our study is the large role memory hierarchy plays in limiting performance. From our experiments, we can explain why the performance of a reconfigurable coprocessor is highly dependent on an efficient memory organisation and the associated bandwidth of data transmission. We observe that image-processing applications demand not only sufficient logic blocks and memory size, but also large number of memory banks and high-bandwidth buses. Since the variation of bus bandwidths encountered in contemporary computer systems is substantial, we suggest that reconfigurable architectures are more efficient when placed as close to the processor as possible without being part of its data-path. In addition, we can justify the performance improvement of current reconfigurable coprocessors over superscalar processors and predict the performance of future FPGA-based systems.

In Section 2, we provide brief explanations of some terms and concepts used for the analysis of reconfigurable architectures, along with a review of previous contributions on this topic. Then, Section 3 describes the experimental methodology used in the performance analysis. We perform detailed simulation of four image-processing benchmarks. 4 Improving computer performance, 5 Analysis of memory system performance, 6 FPGA-based coprocessor design present in detail a comparison of several FPGA-based architectures with a high-performance instruction-set processor. They provide a thorough analysis of four important reconfigurable system parameters in order to support data-parallel and computation-intensive applications: hardware complexity, reconfiguration time, number of memory banks, and host bus bandwidth. Finally, Section 7 concludes the paper.

Section snippets

Reconfigurable architectures and previous work

Reconfigurable computing systems arose from the development of programmable electronic circuits of large hardware complexity. One of their important characteristics is based on the possibility that the hardware architecture can adopt a wide range of forms. This way, the hardware has the ability to implement variety of different microarchitectures with the same circuit [9], [32].

A reconfigurable architecture consists of an array of uncommitted hardware blocks that the end user, operating system

Experimental methodology

This paper studies the reconfigurable system model shown in Fig. 3. Its architecture is composed of a high-performance general-purpose processor and a system based on FPGAs and memory blocks. The reconfigurable coprocessor has a remote interface since the general-purpose processor and the reconfigurable hardware are connected through a bus. The processor and the reconfigurable data-path may support data processing in parallel or concurrently. Customising the hardware configuration of the remote

Improving computer performance

All the benchmarks were coded in five different ways, corresponding to five hardware microarchitectures, which were called uAx (x=1,…,5). These microarchitectures are targeted at FPGA devices. The combinations of hardware techniques used for each design alternative are shown in Table 3. Each of them offers a different way to exploit the data-level parallelism inherent to the benchmarks. Note that the microarchitectures can be multicycle or pipelined. Additionally, many of them provide

Analysis of memory system performance

This section first evaluates the impact of memory organisation on the performance of reconfigurable architectures that follow the architectural model shown in Fig. 1, and then considers why the bank count may be important in achieving high performance. It ends by analysing the influence of host bus on performance improvement.

FPGA-based coprocessor design

With the quantitative evaluation shown in this paper, the realistic performance improvement that can be obtained by currently available FPGA-based coprocessors can be justified. This conclusion has been verified through program implementation on the PCI board RC1000-PP. The cost to purchase this board is approximately €2500 in 2003. It features four banks of local memory and one FPGA device, which can support a hardware complexity equivalent to 1.2E+4 2-bit slices [5]. There is sufficient

Conclusions

The goal of this work was the study of the architectural behaviour and requirements of remote reconfigurable systems. FPGA-based coprocessors can achieve a speed-up of two orders of magnitude when a Pentium III is taken as the base system. The influence of four characteristics of reconfigurable architectures on this performance level has been analysed: hardware capacity, reconfiguration time, local memory organisation, and host bus bandwidth.

Using real image-processing applications and

Acknowledgements

The Ministry of Education and Science of Spain under contract TIC98-0322-C03-02, the “Gobierno de Canarias”, Xilinx and Celoxica supported this work. The author acknowledges the reviewers for their useful comments and suggestions.

Domingo Benitez is full professor of Computer Architecture and Technology at the University of Las Palmas G.C. (Spain) where he has been since 1987. He received the BS degree in Physics from the University of La Laguna (Spain) in 1987, and the Ph.D. in Computer Science from the University of Las Palmas G.C. in 1994. His research interests include computer architecture, configurable computing, special purpose processors, and embedded systems.

References (36)

  • D Benitez

    Modular architecture for custom-built systems oriented to real-time computer vision: application to color recognition

    Journal of System Architecture

    (1997)
  • D Benı́tez et al.

    Reactive computer vision system with reconfigurable architecture

  • I Bolsens

    Challenges and opportunities for FPGA platforms

  • D.A Buell et al.
    (1996)
  • Celoxica

    RC1000-PP Hardware Reference Manual

    (1998)
  • Celoxica

    Handel-C Language Reference Manual

    (1998)
  • T.J Callahan et al.

    The Garp architecture and C compiler

    IEEE Computer

    (2000)
  • Y. Chou, P. Pillai, H. Schmit, J.P. Shen, PipeRench implementation of the instruction path coprocessor, in: Proc. Int....
  • K Compton et al.

    Reconfigurable computing: a survey of systems and software

    ACM Computing Surveys

    (2002)
  • T.M Conte et al.

    Challenges to combining general-purpose and multimedia processors

    IEEE Computer

    (1997)
  • K Diefendorff et al.

    How multimedia workloads will change processor design?

    IEEE Computer

    (1997)
  • G Gent et al.

    An FPGA-based custom coprocessor for automatic image segmentation applications

  • S.C Goldstein et al.

    PipeRench: a reconfigurable architecture and compiler

    IEEE Computer

    (2000)
  • P Green et al.

    An evaluation of an FPGA run-time support system

  • R. Hartenstein, Reconfigurable computing: a new business model––and its Impact on SoC Design (invited keynote), in:...
  • J.R. Hauser, Augmenting a Microprocessor with Reconfigurable Hardware, Ph.D. Thesis, University of California,...
  • C Iseli et al.

    Spyder: A SURE (Superscalar and Reconfigurable) processor

    Journal of Supercomputing

    (1995)
  • T Komarek et al.

    Array architectures for block matching algorithms

    IEEE Transactions on Circuits and Systems

    (1989)
  • Cited by (12)

    View all citing articles on Scopus

    Domingo Benitez is full professor of Computer Architecture and Technology at the University of Las Palmas G.C. (Spain) where he has been since 1987. He received the BS degree in Physics from the University of La Laguna (Spain) in 1987, and the Ph.D. in Computer Science from the University of Las Palmas G.C. in 1994. His research interests include computer architecture, configurable computing, special purpose processors, and embedded systems.

    View full text