Abstract
Multiprocessor systems offer numerous configurations in terms of a different number of cores and frequency levels that may become optimal with respect to energy, performance, or other metrics. On FPGAs, a convenient solution for designing and building a multiprocessor system is the use of soft-core processors. The soft-core processor configuration and frequency are customizable and configurable at design time and according to the FPGA capacity, the number of cores and its configuration can be changed. In this research, different workloads have been studied and results shown that the amount of speedup would be different for each workload due to their behavior in an MPSoC fashion and it was revealed that in a multiprocessor system based on soft-core on an FPGA platform, increasing the number of processors and their operating frequency will not always improve the system energy-delay product (EDP). Hence, identifying an optimal configuration with respect to a given metric such as EDP is a complex process due to a large number of workloads and configurations. To achieve this, we use the power consumption and execution time information to identify the optimal configuration for different workloads with respect to EDP. We also perform an extensive workload characterization using performance counters available on the target platform. Using these performance counters, a vast amount of characterization data for each workload was collected. Then, we used this characterization data to choose the optimal configuration for each workload. This paper proposes a characterization method for parallel workloads that can be used to determine the optimal core and frequency configuration of an FPGA-based homogenous soft-core multiprocessor system with respect to EDP as a function of the workload (It is a datatype. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).).
Similar content being viewed by others
References
Wolf W (2004) The future of multiprocessor systems-on-chip. In: Proc. of the Design Automation Conference (DAC’04). 681–685. doi: https://doi.org/10.1145/996566.996753
Wolf W (2005) Multimedia applications of multiprocessor systems-on-chip. In: Proc. of the Design, Automation, and Test in Europe Conference (DATE’05). 86–89. doi: https://doi.org/10.1109/DATE.2005.217
Intel, Inc., https://www.intel.com/content/www/us/en/products/processors/core/i9-processors.html
Adapteva, Inc., http://www.adapteva.com/products/e64g401/
Wu J, Williams J, Bergmann N, Sutton P (2009) Design exploration for FPGA-based multiprocessor architecture: JPEG encoding case study. In Proc. of the 17th IEEE Symposium on Field Programmable Custom Computing Machines. 299–302. doi: https://doi.org/10.1109/FCCM.2009.7
Huerta P, Castillo J, Martínez JI, López V (2005) A microblaze based multiprocessor SoC. WSEAS Trans Circuits Syst 4(5):423–430
Skalicky S, Schmidt AG, Lopez S, French M (2015) A unified hardware/software MPSoC system construction and run-time framework. In: Proc. of the Conference on Design, Automation, and Test in Europe. 301–304. doi: https://doi.org/10.7873/DATE.2015.0097
Masud N, Nasir J, Nazir MS, Aqil M (2015) FPGA based multiprocessor embedded system for real-time image processing. In Proc. of the 15th International Conference on Control, Automation, and Systems (ICCAS). 436–438. doi: https://doi.org/10.1109/ICCAS.2015.7364955
Pomante L, Serri P, Marchesani S (2013) System-level design space exploration for heterogeneous parallel dedicated systems. In: Proc of the World Congress on Computer and Information Technology (WCCIT). https://doi.org/10.1109/WCCIT.2013.6618780
Hong S, Oguntebi T, Casper J, Bronson N, Kozyrakis C, Olukotun K (2012) A case of system-level hardware-software co-design and co-verification of a commodity multiprocessor system with custom hardware. In Proc. of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 513–520. doi: https://doi.org/10.1145/2380445.2380524
Moness M, Khaled M, Youness H (2014) MPSoCs and multicore microcontrollers for embedded PID control: a detailed study. IEEE Trans Indust Inform 10(4):2122–2134. https://doi.org/10.1109/TII.2014.2355036
Raza MA, Azeemuddin S (2014) Multiprocessing on FPGA using light weight processor. In Proc. of the IEEE International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT). pp 1–6. doi: https://doi.org/10.1109/CONECCT.2014.6740339
Priya S, Swetha A (2015) FPGA implementation of cryptographic algorithm in a multiprocessing system. Int J Innovative Res Comput Commun Eng 03(05):3986–3990. https://doi.org/10.15680/ijircce.2015.0305036
Mimouni EHE, Karim M (2014) A MicroBlaze based multiprocessor system on chip for real-time cardiac monitoring. International Conference on Multimedia Computing and Systems (ICMCS). https://doi.org/10.1109/ICMCS.2014.6911414
Kranenburg T, van Leuken R (2010) MB-LITE: A robust, light-weight soft-core implementation of the MicroBlaze architecture. In: Proc. of the Design, Automation & Test in Europe Conference & Exhibition (DATE ‘10). pp 997–1000. doi: https://doi.org/10.1109/DATE.2010.5456903
Habib B, Anber A, Khan SD (2016) The effect of multi-core communication architecture on system performance. George Washington University. pp 1–5
Nie Y, Ma Z, Jing L (2015) Research on the design of multi-core embedded system based on MicroBlaze. Int J Control Autom 8(12):425–434. https://doi.org/10.14257/ijca.2015.8.12.39
Sotiropoulou CL, Nikolaidis S (2010) Design space exploration for fpga based multiprocessing systems. In Proc. of the 17th IEEE International Conference on Electronics, Circuits, and Systems (ICECS). 1164–1167. doi: https://doi.org/10.1109/ICECS.2010.5724724
Nawinne I, Schneider J, Javaid H, Parameswaran S (2014) Hardware-based fast exploration of cache hierarchies in application specific MPSoCs. In: Proc. of the Design, Automation & Test in Europe Conference & Exhibition (DATE). pp 1–6. doi: https://doi.org/10.7873/DATE.2014.296
Alipour M, Taghdisi H, Sadeghzadeh SH (2012) Multi-objective design space exploration of cache for embedded applications. In: Proc. of the 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE). pp 1–4. doi: https://doi.org/10.1109/CCECE.2012.6334940
Wijesundera D, Prakash A, Srikanthan T, Ihalage A (2018) Framework for rapid performance estimation of embedded soft core processors. ACM Trans Reconfigur Technol Syst. https://doi.org/10.1145/3195801
Matthews E, Shannon L, Fedorova A (2016) Shared memory multicore MicroBlaze system with SMP linux support. ACM Trans Reconfigur Technol. https://doi.org/10.1145/2870638
Li J, Martínez JF (2005) Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans Archit Code Optim (TACO) 2:397–422
Charr JC, Couturier R, Fanfakh A, Giersch A (2014) Dynamic frequency scaling for energy consumption reduction in synchronous distributed applications. IEEE International Symposium on Parallel and Distributed Processing with Applications, pp 225–230
Wu X, Taylor V, Cook J, Mucci PJ (2016) Using performance-power modeling to improve energy efficiency of HPC applications. published by the IEEE Computer Society.
Lively C et al (2014) (2014) E-AMOM: an energy-aware modeling and optimization methodology for scientific applications on multicore systems. Comput Sci Res Develop 29(3):197–210
Sheikh HF, Ahmad I, Fan D (2016) An evolutionary technique for performance-energy-temperature optimized scheduling of parallel tasks on multi-core processors. IEEE Trans Parallel Distrib Syst 27(3):668–681. https://doi.org/10.1109/TPDS.2015.2421352
De Sensi D (2016) Predicting performance and power consumption of parallel applications. 24th Euro Micro International Conference on Parallel, Distributed, and Network-Based Processing
Alonso P, Dolz M, Mayo R, Quintana-Ort E (2014) Modeling power and energy of the task-parallel cholesky factorization on multicore processors. Comput Sci Res Develop 29(2):105–112
Xilinx. (2016) WP469: Using the MicroBlaze processor to accelerate cost-sensitive embedded system development. Retrieved September 11, 2018. https://www.xilinx.com/support/documentation/white_papers/wp469-microblaze-for-cost-sensitive-apps.pdf.
Intel (2020) Nios II Performance benchmarks. https://www.intel.com/content
Xilinx. (2020) MicroBlaze processer reference guide. Retrieved November 11, 2020. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug984-microblaze-ref.pdf
Xilinx. (2018) LogiCORE IP Product Guide: Mailbox v2.1. Retrieved September 11, 2018. https://www.xilinx.com/support/documentation/ip_documentation/mailbox/v2_1/pg114-mailbox.pdf
Wang W, Mishra P (2010) Leakage-aware energy minimization using dynamic voltage scaling and cache reconfiguration in real-time systems. In: Proc. of the 23rd International Conference on VLSI Design (VLSID'10). 357–362. doi: https://doi.org/10.1109/VLSI.Design.2010.22
Xilinx. (2020) Vivado design user guide: logic simulation. Retrieved November 23, 2020. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug900-vivado-logic-simulation.pdf
Xilinx. (2020) Xilinx power estimator user guide. Retrieved December 4, 2020. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug440-xilinx-power-estimator.pdf
Xilinx. (2020) Vivado design suite tutorial: power analysis and optimization. Retrieved November 11, 2020. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug997-vivado-power-analysis-optimization-tutorial.pdf
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Samsami Khodadad, F., Noori, H. Characterizing energy and performance of soft-core-based homogeneous multiprocessor systems. J Supercomput 78, 9079–9101 (2022). https://doi.org/10.1007/s11227-021-04273-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04273-7