Performance of reconfigurable architectures for image-processing applications
Introduction
The abstract programming model of instruction-set architectures allows an application developer to concentrate on the algorithm in hand rather than on its hardware implementation. When reconfigurable systems are being used, the hardware structure can be accessible to the programmer, operative system or compiler [14], [29], [31]. This type of programming model can implement abstract operations and provides different levels of performance by allowing the software to try out several hardware designs. Reconfigurable architectures are more efficiently used than instruction-set architectures when the algorithm has high degree of regularity and exploits high levels of data parallelism. Additionally, in many applications with computationally intensive operations that are mapped to reconfigurable architectures, performance may be higher than is exhibited by instruction-set computers. One of the open research problems of configurable computing is understanding the limitation of configurable devices when competing with microprocessors [3], [15], [16].
Image Processing refers to an important class of data-parallel and computation-intensive applications that is becoming one of the dominants computing workloads [10], [11]. These applications operate on data to be presented visually or to extract image features for several tasks such as object detection, tracking, and classification. Image Processing is used to be considered as part of the multimedia domain. In general, multimedia applications are characterised by the streaming nature of data accesses and their large working-sets. These features do not make effective use of microprocessor hardware resources. Alternatively, configurable computing may provide more efficient hardware solutions [2], [25].
In spite of the large amount of attention given to multimedia workloads by investigators in the domain of configurable computing, there is little quantitative understanding of the performance of such applications on FPGA-based coprocessors. A major challenge for such studies is the enormous cost for developing applications using standardised hardware description languages [12], [15], [19].
The goal of this work is to explore the architectural behaviour of reconfigurable FPGA-based coprocessors that are part of general-purpose computers. Their performances are compared with that achieved by a successful Pentium III processor.
Further motivating our study is the large role memory hierarchy plays in limiting performance. From our experiments, we can explain why the performance of a reconfigurable coprocessor is highly dependent on an efficient memory organisation and the associated bandwidth of data transmission. We observe that image-processing applications demand not only sufficient logic blocks and memory size, but also large number of memory banks and high-bandwidth buses. Since the variation of bus bandwidths encountered in contemporary computer systems is substantial, we suggest that reconfigurable architectures are more efficient when placed as close to the processor as possible without being part of its data-path. In addition, we can justify the performance improvement of current reconfigurable coprocessors over superscalar processors and predict the performance of future FPGA-based systems.
In Section 2, we provide brief explanations of some terms and concepts used for the analysis of reconfigurable architectures, along with a review of previous contributions on this topic. Then, Section 3 describes the experimental methodology used in the performance analysis. We perform detailed simulation of four image-processing benchmarks. 4 Improving computer performance, 5 Analysis of memory system performance, 6 FPGA-based coprocessor design present in detail a comparison of several FPGA-based architectures with a high-performance instruction-set processor. They provide a thorough analysis of four important reconfigurable system parameters in order to support data-parallel and computation-intensive applications: hardware complexity, reconfiguration time, number of memory banks, and host bus bandwidth. Finally, Section 7 concludes the paper.
Section snippets
Reconfigurable architectures and previous work
Reconfigurable computing systems arose from the development of programmable electronic circuits of large hardware complexity. One of their important characteristics is based on the possibility that the hardware architecture can adopt a wide range of forms. This way, the hardware has the ability to implement variety of different microarchitectures with the same circuit [9], [32].
A reconfigurable architecture consists of an array of uncommitted hardware blocks that the end user, operating system
Experimental methodology
This paper studies the reconfigurable system model shown in Fig. 3. Its architecture is composed of a high-performance general-purpose processor and a system based on FPGAs and memory blocks. The reconfigurable coprocessor has a remote interface since the general-purpose processor and the reconfigurable hardware are connected through a bus. The processor and the reconfigurable data-path may support data processing in parallel or concurrently. Customising the hardware configuration of the remote
Improving computer performance
All the benchmarks were coded in five different ways, corresponding to five hardware microarchitectures, which were called uAx (x=1,…,5). These microarchitectures are targeted at FPGA devices. The combinations of hardware techniques used for each design alternative are shown in Table 3. Each of them offers a different way to exploit the data-level parallelism inherent to the benchmarks. Note that the microarchitectures can be multicycle or pipelined. Additionally, many of them provide
Analysis of memory system performance
This section first evaluates the impact of memory organisation on the performance of reconfigurable architectures that follow the architectural model shown in Fig. 1, and then considers why the bank count may be important in achieving high performance. It ends by analysing the influence of host bus on performance improvement.
FPGA-based coprocessor design
With the quantitative evaluation shown in this paper, the realistic performance improvement that can be obtained by currently available FPGA-based coprocessors can be justified. This conclusion has been verified through program implementation on the PCI board RC1000-PP. The cost to purchase this board is approximately €2500 in 2003. It features four banks of local memory and one FPGA device, which can support a hardware complexity equivalent to 1.2E+4 2-bit slices [5]. There is sufficient
Conclusions
The goal of this work was the study of the architectural behaviour and requirements of remote reconfigurable systems. FPGA-based coprocessors can achieve a speed-up of two orders of magnitude when a Pentium III is taken as the base system. The influence of four characteristics of reconfigurable architectures on this performance level has been analysed: hardware capacity, reconfiguration time, local memory organisation, and host bus bandwidth.
Using real image-processing applications and
Acknowledgements
The Ministry of Education and Science of Spain under contract TIC98-0322-C03-02, the “Gobierno de Canarias”, Xilinx and Celoxica supported this work. The author acknowledges the reviewers for their useful comments and suggestions.
Domingo Benitez is full professor of Computer Architecture and Technology at the University of Las Palmas G.C. (Spain) where he has been since 1987. He received the BS degree in Physics from the University of La Laguna (Spain) in 1987, and the Ph.D. in Computer Science from the University of Las Palmas G.C. in 1994. His research interests include computer architecture, configurable computing, special purpose processors, and embedded systems.
References (36)
Modular architecture for custom-built systems oriented to real-time computer vision: application to color recognition
Journal of System Architecture
(1997)- et al.
Reactive computer vision system with reconfigurable architecture
Challenges and opportunities for FPGA platforms
- et al.(1996)
RC1000-PP Hardware Reference Manual
(1998)Handel-C Language Reference Manual
(1998)- et al.
The Garp architecture and C compiler
IEEE Computer
(2000) - Y. Chou, P. Pillai, H. Schmit, J.P. Shen, PipeRench implementation of the instruction path coprocessor, in: Proc. Int....
- et al.
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys
(2002) - et al.
Challenges to combining general-purpose and multimedia processors
IEEE Computer
(1997)
How multimedia workloads will change processor design?
IEEE Computer
An FPGA-based custom coprocessor for automatic image segmentation applications
PipeRench: a reconfigurable architecture and compiler
IEEE Computer
An evaluation of an FPGA run-time support system
Spyder: A SURE (Superscalar and Reconfigurable) processor
Journal of Supercomputing
Array architectures for block matching algorithms
IEEE Transactions on Circuits and Systems
Cited by (12)
Hierarchical neural networks based prediction and control of dynamic reconfiguration for multilevel embedded systems
2013, Journal of Systems ArchitectureCitation Excerpt :In other words, it may represent information about the processing time and the resources needed and thus the related power consumption. Reconfigurable architectures are more efficiently used than instruction set architectures mainly in the case of data-parallel and computation-intensive applications [26]. Indeed the use of the reconfigurable system is very intriguing and dynamic allocation shows many advantages such as performance improvement as well as energy saving [27].
The discussion on interior design mode based on 3D virtual vision technology
2019, Journal of Advanced Computational Intelligence and Intelligent InformaticsResearch on interior design of architectural decoration based on 3D simulation technology
2017, Agro Food Industry Hi-TechReconfigurable architecture-based implementation of non-uniformity correction for long wave IR sensors
2017, Advances in Intelligent Systems and ComputingFPGA based implementation of real-time image enhancement algorithms for Electro-Optical surveillance systems
2015, ECTI-CON 2015 - 2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information TechnologyFeature extraction using reconfigurable hardware
2010, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Domingo Benitez is full professor of Computer Architecture and Technology at the University of Las Palmas G.C. (Spain) where he has been since 1987. He received the BS degree in Physics from the University of La Laguna (Spain) in 1987, and the Ph.D. in Computer Science from the University of Las Palmas G.C. in 1994. His research interests include computer architecture, configurable computing, special purpose processors, and embedded systems.