skip to main content
research-article

Design and implementation of a MicroBlaze-based warp processor

Published: 22 April 2009 Publication History

Abstract

While soft processor cores provided by FPGA vendors offer designers with increased flexibility, such processors typically incur penalties in performance and energy consumption compared to hard processor core alternatives. The recently developed technology of warp processing can help reduce those penalties. Warp processing is the dynamic and transparent transformation of critical software regions from microprocessor execution to much faster circuit execution on an FPGA. In this article, we describe an implementation of a warp processor on a Xilinx Virtex-II Pro and Spartan3 FPGAs incorporating one or more MicroBlaze soft processor cores. We further provide a detailed analysis of the energy overhead of dynamically partitioning an application's kernels to hardware executing within an FPGA. Considering an implementation that periodically partitions the executing application once every minute, a MicroBlaze-based warp processor implemented on a Spartan3 FPGA achieves average speedups of 5.8× and energy reductions of 49% compared to the MicroBlaze soft processor core alone—providing competitive performance and energy consumption compared to existing hard processor cores.

References

[1]
Altera Corp. 2007. http://www.altera.com.
[2]
Atmel Corp. 2007. http://www.atmel.com.
[3]
ARM Ltd. 2007. http://www.arm.com.
[4]
Balboni, A., Fornaciari, W., and Sciuto, D. 1996. Partitioning and exploration in the TOSCA co-design flow. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'96). IEEE, Los Alamitos, CA, 62--69.
[5]
Burger, D. and Austin, T. 1997. The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News 25, 3.
[6]
Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and tabu search. Kluwer's Des. Autom. Embed. Syst. 2, 1, 5--32.
[7]
EEMBC. 2005. The embedded microprocessor benchmark consortium. http://www.eembc.org.
[8]
Ernst, R., Henkel, J., and Benner, T. 1993. Hardware-software cosynthesis for microcontrollers. IEEE Des. Test Comput. 10, 4, 64--75.
[9]
Gajski, D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. VLSI Syst. 6, 1, 84--100.
[10]
Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM, New York, 117--124.
[11]
Halfhill, T. 2003. MIPS embraces configurable technology. Microprocessor Rep. 7--15.
[12]
Henkel, J. 1996. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the Design Automation Conference (DAC'96). ACM, New York, 122--127.
[13]
Henkel, J. and Li, Y. 1998. Energy-conscious HW/SW-partitioning of embedded systems: a case study on an MPEG-2 encoder. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'98). IEEE, Los Alamitos, CA, 22--27.
[14]
Henkel, J. and Ernst, R. 1997. A hardware/software partitioner using a dynamically determined granularity. In Proceedings of the Design Automation Conference (DAC'97). ACM, New York, 691--696.
[15]
Lysecky, R., Stitt, G., and Vahid, F. 2006. Warp processors. ACM Trans. Des. Automat. Elect. Syst. 11, 3, 659--681.
[16]
Lysecky, R. and Vahid, F. 2004. A configurable logic architecture for dynamic hardware/software partitioning. In Proceedings of the Design Automation and Test in Europe Conference (DATE'04). IEEE, Los Alamitos, CA, 480--485.
[17]
Lysecky, R. and Vahid, F. 2003. On-chip logic minimization. In Proceedings of the Design Automation Conference (DAC'03). ACM, New York, 334--337.
[18]
Lysecky, R., Vahid, F., and Tan, S. 2005. A study of the scalability of on-chip routing for just-in-time FPGA compilation. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05). IEEE, Los Alamitos, CA, 57--62.
[19]
Lysecky, R., Vahid, F., and Tan, S. 2004. Dynamic FPGA routing for just-in-time FPGA compilation. In Proceedings of the Design Automation Conference (DAC'04). ACM, New York, 334--337.
[20]
Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'00). IEEE, Los Alamitos, CA, 241--243.
[21]
Stitt, G., Lysecky, R., and Vahid, F. 2003. Dynamic hardware/software partitioning: a first approach. In Proceedings of the Design Automation Conference (DAC'03). ACM, New York, 250--255.
[22]
Stitt, G. and Vahid, F. 2002a. The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des. Test Comput. 9, 6, 36--43.
[23]
Stitt, G. and Vahid, F. 2002b. Hardware/software partitioning of software binaries. In Proceedings of the International Conference on Computer Aided Design (ICCAD'02). ACM, New York, 164--170.
[24]
Stitt, G., Vahid, F., and Najjar, W. 2006. A code refinement methodology for performance-improved synthesis from C. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'06). ACM, New York.
[25]
Tensilica, Inc. 2007. http://www.tensilica.com.
[26]
Triscend Corp. 2003. http://www.triscend.com.
[27]
Venkataramani, G., Najjar, W., Kurdahi, F., Bagherzadeh, N., and Bohm, W. 2001. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES'01). ACM, New York, 116--125.
[28]
Vissers, K. 2004. Programming models and architectures for FPGAs. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES'04). ACM, New York.
[29]
Wan, M., Ichikawa, Y., Lidsky, D., and Rabaey, J. 1998. An energy conscious methodology for early design space exploration of heterogeneous DSPs. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC'98). IEEE, Los Alamitos, CA, 111--117.
[30]
Xilinx, Inc. 2007. http://www.xilinx.com.
[31]
Xilinx, Inc. 2003. Xilinx Press Release #03142, http://www.xilinx.com/prs_rls/silicon_spart/03142s3_pricing.htm.

Cited By

View all
  • (2024)A review on security implementations in soft-processors for IoT applicationsComputers and Security10.1016/j.cose.2023.103677139:COnline publication date: 16-May-2024
  • (2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
  • (2020)Design of Double-Precision Fully-Programmable Computational Unit for FPGA and ASIC2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)10.1109/iCCECE49321.2020.9231146(21-26)Online publication date: 17-Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 8, Issue 3
April 2009
239 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1509288
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 22 April 2009
Accepted: 01 August 2007
Revised: 01 May 2007
Received: 01 February 2007
Published in TECS Volume 8, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. Warp processors
  3. configurable logic
  4. dynamic optimization
  5. hardware/software partitioning
  6. just-in-time (JIT) compilation
  7. soft processor cores

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A review on security implementations in soft-processors for IoT applicationsComputers and Security10.1016/j.cose.2023.103677139:COnline publication date: 16-May-2024
  • (2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
  • (2020)Design of Double-Precision Fully-Programmable Computational Unit for FPGA and ASIC2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)10.1109/iCCECE49321.2020.9231146(21-26)Online publication date: 17-Aug-2020
  • (2019)Analysis of using a MicroBlaze processor for hardware implementation of algorithms for data processing in electronic recognition devices and systems based on the example of a XILINX FPGA systemXII Conference on Reconnaissance and Electronic Warfare Systems10.1117/12.2525056(59)Online publication date: 27-Mar-2019
  • (2018)A Reusable Hybrid RISC Processor with Programmable Instruction Set2018 15th International Multi-Conference on Systems, Signals & Devices (SSD)10.1109/SSD.2018.8570509(1028-1031)Online publication date: Mar-2018
  • (2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
  • (2017)Controlling the design and development cycleEmbedded Computing for High Performance10.1016/B978-0-12-804189-5.00003-X(57-98)Online publication date: 2017
  • (2016)A new programmable ALU architecture for hard-core processor2016 13th International Multi-Conference on Systems, Signals & Devices (SSD)10.1109/SSD.2016.7473751(567-570)Online publication date: Mar-2016
  • (2015)Transparent acceleration of program execution using reconfigurable hardwareProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2757061(1066-1071)Online publication date: 9-Mar-2015
  • (2015)Feasibility of high level compiler optimizations in online synthesis2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)10.1109/ReConFig.2015.7393310(1-7)Online publication date: Dec-2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media