Hardware resource utilization optimization in FPGA-based Heterogeneous MPSoC architectures

doi:10.1016/j.micpro.2015.05.006

Microprocessors and Microsystems

Volume 39, Issue 8, November 2015, Pages 1108-1118

https://doi.org/10.1016/j.micpro.2015.05.006 Get rights and content

Abstract

Next generation FPGA circuits will allow the integration of dozens of hard and soft cores as well as dedicated accelerators in the same chip. These Heterogeneous Multiprocessor System-on-Chip (Ht-MPSoC) architectures will allow the design of very complex System-on-Chips (SoC) on a single FPGA chip and will fulfill modern application requirements, in terms of performance/energy consumption ratio. In this paper, we extend existing FPGA-based Ht-MPSoC architectures by considering sharing hardware accelerators among the cores. In these architectures, cores on the FPGA may have different resources that can be shared in different manners. To explore the large space of possible configurations of Ht-MPSoC on FPGA, designer needs a fast and accurate exploration tool. For this reason, a Mixed Integer Programming (MIP) model is also proposed to determine the Ht-MPSoC configuration that consumes the least HW resources while respecting the application execution time constraints. Using our MIP model, the design space of several hundreds of private and shared HW accelerators can be explored in a reasonable time with high accuracy.

Introduction

The increase in HW resources in the latest FPGA generation, makes it possible to implement extremely complex Heterogeneous Multi-Processor System-on-Chip (Ht-MPSoC) architectures. These architectures combine hardware and/or software cores, application-specific HW accelerators and communication units. The Xilinx Zynq 7000 Extensible Processing Platform (EPP) is an example of such architectures embedding a dual core ARM Cortex A9 processor and tens of thousands of programmable gate arrays [1]. Cyclone V from Altera [2] and SmartFusion2 from Micro-Semi [3] are other examples of Ht-MPSoC. These architectures include one or more hard-cores and up to 500 K of reconfigurable logic elements to build computational accelerators (Fig. 1).

Thanks to this reconfigurable area, it is possible to design either a Symmetric Ht-MPSoC (SHt-MPSoC), in which all the processors have the same number of private and shared HW accelerators, or an Asymmetric architectures (AHt-MPSoC) where HW accelerators attached to the different processors differ from one processor to the other. Fig. 2, Fig. 3, show two examples of Ht-MPSoC with 4 processors (P1 to P4). Fig. 2 highlights a 4-core SHt-MPSoC architecture, in which the processors have the same type and number of HW accelerators. In Fig. 2, each processor has one and the same HW accelerator (named Pr) and share m accelerators with the other processors. Fig. 3 gives an example of an AHt-MPSoC. In this architecture P1, P3 and P4 have each one a private accelerator (named Pr i). These private accelerators can be similar or different. P2 has no private accelerator. P1 and P2 share the same accelerator, named “Sh 1” in Fig. 3, whereas P2, P3 and P4 share another HW accelerator, named “Sh 2”. In an AHt-MPSoC, critical applications (or tasks for a multitasked system) are executed by cores with a large number of private accelerators. At the opposite, applications that are not critical are run by cores with small number (or not at all) private HW accelerators.

We think that AHt-MPSoC is a very promising class of architectures as it allows an efficient utilization of HW resources and provides high performances with less energy consumption [5]. However, their utilization increases even more the size of the design space of configurations to explore. Thus, it is necessary to provide to the designer a Design Space Exploration (DSE) tool to determine the best architectural configuration for a given set of concurrent applications. In addition, this tool plays an important role in the design flow of Ht-MPSoC architectures as it allows to determine the most efficient Ht-MPSoC configuration in a reduced time. This configuration is the one that requires the least FPGA resources and gives a reduced execution time and energy budget.

In the literature, very few works have been devoted to DSE tool for AHt-MPSoC. This is due to the fact that previous and current generation of FPGA circuits offer relatively few resources, in terms of logic elements, compared to ASICs. Thus, it was possible to explore the entire design space in relatively short time interval either by simulation or by simple analytical models. In other studies, the authors only consider SHt-MPSoC architectures in which all processors have the same number and type of accelerators. This approach limits the application of Ht-MPSoC and cannot effectively operate for high complex reconfigurable systems.

In the solution that we propose in this paper, we target next generation FPGA circuits with a high number of reconfigurable logic elements and their utilization in AHt-MPSoC architectures. In these architectures, the number of HW accelerators and their type may vary from one processor to another. Furthermore in our model, the applications executed by the processors may differ from one processor to another. Our solution is based on Mixed Integer Linear Programming (MIP) formulation to explore the very large space of possible configurations. Due to the complexity of finding an optimal enumerative solution, the proposed mathematical model allows the identification of a global minimum (i.e. area usage) in a reasonable time. This model was solved, after a process of constraints linearization, using CPLEX linear program solver.

The paper consists of 6 sections. In the next section, a survey on existing approaches in Ht-MPSoC is presented. In Section 3, we detail AHt-MPSoC architectures and discuss their benefits. In Section 4, we develop our MIP-based Design Space Exploration (DSE) for AHt-MPSoC. The next section presents the experimental results and the obtained performances of our AHt-MPSoC for real and synthetic benchmarks. Finally in Section 6, we give a conclusion and some possible extensions to make AHt-MPSoC more efficient.

Section snippets

Related works

The integration of custom instruction in FPGA-based MPSoC increases the performance gain by incorporating hardware components to handle computational tasks [6], [7], [8], [9]. Modern platforms, including FPGAs and ASICs support different couplings of hardware components with the processor. In [10], couplings schemes are classified into two principal modes: Closely coupled mode and Loosely coupled mode (Fig. 4).

In the first mode, the hardware accelerator is part of the processor data path and

AHt-MPSoC based on hardware sharing

Application-specific instructions are an effective way of improving the performance of processors. In these processors, the execution time of the critical computations is reduced by the utilization of new instructions executed on HW accelerators. These HW accelerators can be either loosely coupled to the processor via system bus, or memory controller, or closely coupled to the instruction pipeline. Within this work, the HW accelerators are implemented as hardware modules executing

Mixed integer linear programming model

Our space exploration addresses the way to merge the computational patterns, existing on the different applications, to reduce the overall area usage while respecting applications-performance constraints. Increasing the sharing degree reduces the area usage, but may increase the delay of each processor to access shared accelerators and therefore the required performance will not be met. This situation cannot be accepted for hard and soft real time applications. Thus, the goal of our MIP model

Experimental results

To evaluate the performance of the proposed AHt-MPSoC system as well as to study the effectiveness of our MIP model, we use synthetic and real applications. Our target platform is a Xilinx virtex 5 FPGA. On this platform, several Microblaze softcores running at 125 MHz can be mapped.

Performance measurement and area usage are presented respectively in terms of clock cycles and area units. Power consumption has been measured using the Xilinx Xpower tool.

Conclusion

In this paper we presented a new class of Ht-MPSOC architecture. In the new proposed Asymmetric Heterogeneous Multiprocessor System-on-Chip (Ht-MPSoC) architectures, hardware accelerators are shared between processors in such a way to reduce system cost and increase performance. Experimental results have demonstrated that with AHt-MPSoC it is possible to obtain the same performance with less logic elements than with SHt-MPSOC architecture. Our proposed MIP formulation is also able to explore,

Acknowledgment

This work was supported by CMCU project, funded by the Tunisian Ministry of Higher Education and Scientific Research (MESRS) and the French Ministry of Foreign Affairs and International Development.

Bouthaina Dammak received the engineer degree in electronic engineering from Natinal School of Engineers of Sfax, in 2009, and the Master degree in embedded systems design from the same institute. She is currently working toward the Ph.D. degree. Her research interests Multiprocessor architecture optimization for multi-media domains.

References (23)

Xilinx, Zynq-7000 All Programmable SoC Technical Reference Manual....
Altera, Cyclone V....
Micro-Semi, Smart Fusion 2....
Xilinx. 2014 Virtex ultrascale....
D. Bouthaina, B. Mouna, N. Smail, A. Mohamed, Shared hardware accelerator architectures for Heterogeneous MPSoC, in:...
T. Blank
A survey of hardware accelerators used in computer-aided design
IEEE Des. Test Comput.
(1984)
N. Howard et al.
The use of field-programmable gate arrays for the hardware acceleration of design automation tasks
VLSI Des.
(1996)
B. Reagen, Y.S. Shao, Gu-Yeon Wei, D. Brooks, Quantifying acceleration: Power/performance trade-offs of application...
Altera, Hardware acceleration and coprocessing, 2011....
A.G.A. JeetSingh, A. Chhabra, B. Dwivedi, Soc synthesis with automatic hardware software interface generation, in:...

S.S. Sirowy, Y. Wu, F. Vahid, Two-level microprocessor-accelerator partitioning, in: Proceedings of the Conference...

Cited by (8)

Optimized allocation of FPGA memory for image processing
2021, Microprocessors and Microsystems
Citation Excerpt :
Numerous integrated into soft and hardcore next-generation Field Programmable Gate Array (FPGA) circuits on the same chip would allow the only accelerator. Architecture (HT- multicore on-chip) this heterogeneous multiprocessor system-on-chip, performance/energy consumption rate in terms to meet the needs of modern applications, a this enables the design of the chip (socks) [20]. Very important application using the image processing chip Static Random Access Memory (SRAM).
Memory is the most restricting component for use in the Field Programmable Gate Array (FPGA) for elevated level picture preparation, which requires total casing (s) to be put away in the general area. Since the FPGA on-chip memory work is restricted, utilizing these assets adequately is essential to meet the exhibition, size, and intensity utilization limitations. This article aims to diminish asset utilization and force utilization, explore the picture preparing significant level prompting energy preservation, and understand the portion of on-chip memory assets included the FPGA altogether. The proposed memory engineering strategy, notwithstanding equipment depiction language, the plan of the significant levels of combination, is generally memory inhabitance, you can lessen the force utilization. On-chip formal force model dependent on the memory design choices will demonstrate whether the dividing calculation is higher than in how the regular procedure. Contrasted with business FPGA blend and elevated level union instruments, our outcomes, the proposed calculation shows that can bring about higher effectiveness, the number and size of edges that can be obliged in the uplink outline increment, about buffer% diminished unique force utilization. In the utilization of optical stream and mean movement following speaking to modern calculations, division calculation, as appeared by our exploratory information, without influencing the exhibition, it can lessen the separate all-out force.
Resource Utilization Optimization with Design Alternatives in FPGA based Arithmetic Logic Unit Architectures
2018, Procedia Computer Science
Designing Arithmetic Logic Unit (ALU) is a combinational logic problem. As ALU has a regular pattern, it can be broken into identical stages connected into cascade through carry chain. We have designed one stage of ALU and then duplicated it depending upon the size required. The design has been tested for 4, 8, 16, 32 and 64- bit width. The idea is resource sharing and functionality sharing technique to design an ALU that leads to a significant saving of resources. Different functionality has been obtained by using a single resource (parallel adder) with different inputs at different times through control circuit. The design through this approach leads to a significant reduction in hardware requirement. The design is implemented in 3s700anfgg484-4 FPGA. Significant reduction in hardware has been achieved. The hardware used has been compared with normal function by function design. Resources saving of 66% have been observed for 4-bit wide ALU implementation on FPGA. For 8 and 16-bit implementation the saving obtained is 65%. A hardware saving of 60% has been obtained for 32 and 64-bit implementation.
Design Space Exploration of HW Accelerators and Network Infrastructure for FPGA-Based MPSoC
2024, IEEE Access
GMMSO: game model-combined improved moth search optimization approach for reconfigurable asymmetric multi-processor system-on-chip architecture
2023, Engineering Optimization
A high performance scalable fuzzy based modified Asymmetric Heterogene Multiprocessor System on Chip (AHt-MPSOC) reconfigurable architecture
2022, Journal of Intelligent and Fuzzy Systems
Adaptive Edge Detection Technique Implemented on FPGA
2020, Iranian Journal of Science and Technology - Transactions of Electrical Engineering

View all citing articles on Scopus

Mouna Baklouti is currently an Assistant Professor at the National Engineering School of Sfax, Tunisia. She received the engineering and M.S. degrees from the Tunisian Polytechnic School, Tunis, Tunisia in 2006 and 2007, respectively. She received a Ph.D in Computer Science from the National Engineering School of Sfax, Sfax, Tunisia and University of Lille 1, Lille, France in December 2010. Her research interests include hardware/software co-design, massively parallel systems design and System-on-Chip design.

Mr. Rachid Benmansour was born in Morocco in 1979. He received his M.S. in Industrial Engineering from Mohammadia School of Engineering (Morocco) and his Ph.D. degree in computer science, in 2011, from the University of Pierre and Marie Curie, Paris 6 (France). In 2013, he joined the University of Valenciennes as an associate professor. He was involved in many national and European research projects. He participate also to several industrial project mainly in transportation (rail, air transport). M. Benmansour was a visiting professor at Morocco (Mohammadia School of Engineering) and Germany (Trier university of applied sciences ) and a visiting scholar to USA (North Carolina State University) and Tunisia (National School of Engineers of Sfax). His main areas of research interest are mathematics of operations research, operations management and their applications such as logistics, planning and scheduling, and maintenance. He is a member of the International Association of Railway Operations Research.

Pr. Smail Niar (University of Valenciennes & CNRS, France) received his Ph.D. in computer Engineering from the University of Lille in 1990. Since then, he has been professor at the University of Valenciennes. He is leader of the “Mobile & Embedded Systems” research group at the “Laboratory For Automation, Mechanical and Computer Engineering”, a joint research unit between CNRS and the university of Valenciennes.

Mohamed Abid received Dipl.-Ing. from the National School of Engineers of Sfax (ENIS) in 1985 and the phd degree from the National Institution of applied Science, Toulouse, France. In 2000, he received his doctoral degree in Electrical and Computer Engineering at National Engineering School of Tunis. He is currently Professor at the Electrical Department of ENIS. Since 2006, he has been on the Head of the research laboratory “Computer Embedded System” CES-ENIS. He is responsible for research projects in the area of automatic signal and image processing, wireless networks and information systems. He has been on the Head of Federator Research Project since 2009. He has authored or co-authored more than 120 international conference papers, and he has written more than 20 technical contributions to various international standardization projects. He is a member of the Scientific and Program Committees of several international conferences and workshops. He is the Co-coordinator of several Nationals and Internationals projects with universities and industries like DGRSRT, CNRS, INRIA, CMCU, training for research, PNM, Tempra, etc.

^☆: The work was done as part of a Franco-Tunisian project.

View full text

Hardware resource utilization optimization in FPGA-based Heterogeneous MPSoC architectures☆

Abstract

Introduction

Section snippets

Related works

AHt-MPSoC based on hardware sharing

Mixed integer linear programming model

Experimental results

Conclusion

Acknowledgment

A survey of hardware accelerators used in computer-aided design

IEEE Des. Test Comput.

The use of field-programmable gate arrays for the hardware acceleration of design automation tasks

VLSI Des.