Customizable embedded processor array for multimedia applications
Introduction
Computing hardware design methodology has evolved significantly over the years. As chips get larger and complexity of each design increases, flexibility and quick time to market in the form of reprogrammable/reconfigurable chips and systems increase in importance [1]. Several Multi Processor System on a Chip (MPSoC) and Coarse-Grained Reconfigurable Architectures (CGRA) have been proposed in recent years [2], [3], [4]. Using CGRAs may be preferred for several reasons such as speed, area, power or IP re-usability [3]. Furthermore, comparing to Field Programmable Gate Arrays (FPGA), CGRAs have a shorter reconfiguration time. CGRAs are suitable for systems that require intensive computations. By adjusting the number and structure of processing elements on a CGRA, we can obtain an architecture that meets the requirements of the computation.
Image/video processing is an area where algorithms need intensive computation with high performance. Handling this kind of computation usually requires custom hardware [5]. Considering today's technology, every portable device tends to have a camera, e.g. glasses, watches, smart phones, etc. Each device has its own configuration and requires mostly different features. Designing dedicated hardware for image processing tasks for every device is time consuming and not economically feasible at all. In most devices, image processing tasks are handled using System-on-Chips (SoC) with DSP or GPU cores. If a designer chooses to use commercial SoCs, he/she has to accept what the chip offers, in terms of speed and power dissipation. Those architectures may include redundant parts that might not be used at all. This redundancy leads to extra chip area usage and power dissipation. On the other hand, implementing an image processing task on a CGRA yields efficient results in terms of area, power dissipation, or speed comparing to commercial SoCs [6]. Time-to-market of an image/video processing system, which is implemented on customizable cores like CGRAs, is less than that of a custom Application Specific Integrated Circuit (ASIC) [7]. Besides, it is easy to adopt such systems for later alterations. Consequently, we can say that CGRAs are suitable for image/video processing tasks of low power, low cost consumer electronics.
In this paper, we introduce a Customizable Embedded Processor Array for Multimedia Applications (CPAMA). CPAMA consists of a processor array for intensive computation, and a host processor for control and coordination with other devices. Our configurable architecture is designed by considering the nature and requirements of image processing algorithms:
- •
CPAMA processes a multimedia application in sequences of image blocks. Hence, we design a configurable processor array which concurrently processes all pixels in an image block.
- •
Each processor of CPAMA can also be configured according to the position of a pixel in an image block depending on the application.
This paper is organized as follows: In Section 2 we mention the related architectures in literature and demonstrate the differences with the proposed CPAMA. In Section 3, we explain the basic concepts that we refer in CPAMA design. In Section 4 we present the configurable hardware architecture of CPAMA in details. In Section 5, we present our case study implementations and make comparisons with the existing similar architectures. Finally in Section 6, we make our remarks on the CPAMA architecture and conclude the paper.
Section snippets
Related works
Mei et al. [3] proposed a template-based CGRA called Architecture for Dynamically Reconfigurable Embedded System (ADRES). Coarse grained reconfiguration refers to reconfiguration in relatively high level modules, not in logic blocks or in Look Up Tables(LUT) as in an FPGA. A design tool, namely Dynamically Reconfigurable Embedded System Compiler (DRESC) [8], is used for this architecture to generate the design. Propagating data, in other words performing iterations, is implemented in a stream
Basic concepts of CPAMA
CPAMA is mainly designed to be vastly generic and flexible. In every development cycle of CPAMA, requirements and characters of image processing applications have been considered. Register files of the processors, data-path design, instruction set of the processors, communication among the processors, and FIFO structures are all studied considering the image processing domain. CPAMA has a template-based configurable structure. As any template structure, CPAMA has both fixed and configurable
Hardware design
Hardware side of CPAMA consists of a 2D grid network structure as shown in Fig. 4. Considering the nature of image processing, there is a strong similarity between a 2D signal (image) and a 2D Mesh NoC. Therefore, we preferred this type of network in CPAMA.
One processor is connected to each node. Data communication among processors is done by routers. Image is delivered by FIFOs or routers through the network. FIFOs are placed in processors, and deliver the data in one (vertical) direction.
Case studies
We have evaluated performance of CPAMA by implementing four different algorithms; which are dot product, TIFF to gray level image transformation (TIFF2BW) [40], Inverse Discrete Cosine Transform (IDCT) and block-match.
Conclusion
Our proposed architecture CPAMA is a highly configurable processor array targeted for low power, low cost image/video processing devices. In comparison with ADRES, CPAMA has shown better performance in TIFF2BW and comparable performance in IDCT application in terms of energy consumption, throughput and area occupation. We think, this is because it occupies only the necessary hardware for a given application. This is achieved by considering the image processing nature in every development cycle
Acknowledgment
The authors would like to thank Mr. Gökhan Işık for his recommendations on ASIC implementation, and Dr. Salih Bayar for his help on partial reconfiguration techniques in FPGAs.
References (47)
- et al.
A dynamically reconfigurable communication architecture for multicore embedded systems
J. Syst. Archit.
(2012) - et al.
An industrial view of electronic design automation
IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst.
(2000) - et al.
Coarse-grained reconfigurable array architectures
- B. Mei, S. Vernalde, D. Verkest, H. De Man, R. Lauwereins, Adres: An architecture with tightly coupled vliw processor...
- et al.
Adaptive multiprocessor865 system-on-chip architecture: new degrees of freedom in systemdesign and runtime support
- et al.
Accelerating embedded image processing for real time: a case study
J. Real-Time Image Process.
(2013) - et al.
Still image processing on coarse-grained reconfigurable array architectures
J. Signal Process. Syst.
(2010) - et al.
A system on a chip architecture of an h.264/avc coprocessor for dvb-h and dmb applications
IEEE Trans. Consum. Electron.
(2007) - et al.
Adres&dresc: Architecture and compiler for coarse-grain recon gurable processors
- A. Marshall, T. Stansfield, I. Kostarnov, J. Vuillemin, B. Hutchings, A reconfigurable arithmetic array for multimedia...
Customising a processor architecture for multimedia applications
Electron. Syst. Softw.
Architecture of a fully pipelined real-time cellular neural network emulator
IEEE Trans. Circuits Syst. I: Reg. Pap.
A cnn-specific integrated processor
EURASIP J. Adv Signal Process.
Cited by (2)
PulseDL: A reconfigurable deep learning array processor dedicated to pulse characterization for high energy physics detectors
2020, Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated EquipmentCitation Excerpt :In the inference phase, the majority of mathematical operations in the network are multiply-accumulate (MAC), so it is important to improve the efficiency of the MAC operations. Several system architectures are possible to finish the task, such as the single CPU solution [15], the many-core solution [16] and the array processor solution [17]. Considering the particular demand and system complexity, we choose the customized array processor as our overall hardware architecture.
Lifting-based fractional wavelet filter: Energy-efficient DWT architecture for low-cost wearable sensors
2020, Advances in Multimedia
- 1
Anka Microelectronic Systems.