research-article

Design and implementation of a MicroBlaze-based warp processor

Authors:

Frank VahidAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 8, Issue 3

Article No.: 22, Pages 1 - 22

https://doi.org/10.1145/1509288.1509294

Published: 22 April 2009 Publication History

Abstract

While soft processor cores provided by FPGA vendors offer designers with increased flexibility, such processors typically incur penalties in performance and energy consumption compared to hard processor core alternatives. The recently developed technology of warp processing can help reduce those penalties. Warp processing is the dynamic and transparent transformation of critical software regions from microprocessor execution to much faster circuit execution on an FPGA. In this article, we describe an implementation of a warp processor on a Xilinx Virtex-II Pro and Spartan3 FPGAs incorporating one or more MicroBlaze soft processor cores. We further provide a detailed analysis of the energy overhead of dynamically partitioning an application's kernels to hardware executing within an FPGA. Considering an implementation that periodically partitions the executing application once every minute, a MicroBlaze-based warp processor implemented on a Spartan3 FPGA achieves average speedups of 5.8× and energy reductions of 49% compared to the MicroBlaze soft processor core alone—providing competitive performance and energy consumption compared to existing hard processor cores.

References

[1]

Altera Corp. 2007. http://www.altera.com.

[2]

Atmel Corp. 2007. http://www.atmel.com.

[3]

ARM Ltd. 2007. http://www.arm.com.

[4]

Balboni, A., Fornaciari, W., and Sciuto, D. 1996. Partitioning and exploration in the TOSCA co-design flow. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'96). IEEE, Los Alamitos, CA, 62--69.

Digital Library

[5]

Burger, D. and Austin, T. 1997. The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News 25, 3.

Digital Library

[6]

Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and tabu search. Kluwer's Des. Autom. Embed. Syst. 2, 1, 5--32.

Digital Library

[7]

EEMBC. 2005. The embedded microprocessor benchmark consortium. http://www.eembc.org.

[8]

Ernst, R., Henkel, J., and Benner, T. 1993. Hardware-software cosynthesis for microcontrollers. IEEE Des. Test Comput. 10, 4, 64--75.

Digital Library

[9]

Gajski, D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. VLSI Syst. 6, 1, 84--100.

Digital Library

[10]

Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM, New York, 117--124.

Digital Library

[11]

Halfhill, T. 2003. MIPS embraces configurable technology. Microprocessor Rep. 7--15.

[12]

Henkel, J. 1996. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the Design Automation Conference (DAC'96). ACM, New York, 122--127.

Digital Library

[13]

Henkel, J. and Li, Y. 1998. Energy-conscious HW/SW-partitioning of embedded systems: a case study on an MPEG-2 encoder. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'98). IEEE, Los Alamitos, CA, 22--27.

Digital Library

[14]

Henkel, J. and Ernst, R. 1997. A hardware/software partitioner using a dynamically determined granularity. In Proceedings of the Design Automation Conference (DAC'97). ACM, New York, 691--696.

Digital Library

[15]

Lysecky, R., Stitt, G., and Vahid, F. 2006. Warp processors. ACM Trans. Des. Automat. Elect. Syst. 11, 3, 659--681.

Digital Library

[16]

Lysecky, R. and Vahid, F. 2004. A configurable logic architecture for dynamic hardware/software partitioning. In Proceedings of the Design Automation and Test in Europe Conference (DATE'04). IEEE, Los Alamitos, CA, 480--485.

Digital Library

[17]

Lysecky, R. and Vahid, F. 2003. On-chip logic minimization. In Proceedings of the Design Automation Conference (DAC'03). ACM, New York, 334--337.

Digital Library

[18]

Lysecky, R., Vahid, F., and Tan, S. 2005. A study of the scalability of on-chip routing for just-in-time FPGA compilation. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05). IEEE, Los Alamitos, CA, 57--62.

Digital Library

[19]

Lysecky, R., Vahid, F., and Tan, S. 2004. Dynamic FPGA routing for just-in-time FPGA compilation. In Proceedings of the Design Automation Conference (DAC'04). ACM, New York, 334--337.

Digital Library

[20]

Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'00). IEEE, Los Alamitos, CA, 241--243.

Digital Library

[21]

Stitt, G., Lysecky, R., and Vahid, F. 2003. Dynamic hardware/software partitioning: a first approach. In Proceedings of the Design Automation Conference (DAC'03). ACM, New York, 250--255.

Digital Library

[22]

Stitt, G. and Vahid, F. 2002a. The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des. Test Comput. 9, 6, 36--43.

Digital Library

[23]

Stitt, G. and Vahid, F. 2002b. Hardware/software partitioning of software binaries. In Proceedings of the International Conference on Computer Aided Design (ICCAD'02). ACM, New York, 164--170.

Digital Library

[24]

Stitt, G., Vahid, F., and Najjar, W. 2006. A code refinement methodology for performance-improved synthesis from C. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'06). ACM, New York.

Digital Library

[25]

Tensilica, Inc. 2007. http://www.tensilica.com.

[26]

Triscend Corp. 2003. http://www.triscend.com.

[27]

Venkataramani, G., Najjar, W., Kurdahi, F., Bagherzadeh, N., and Bohm, W. 2001. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES'01). ACM, New York, 116--125.

Digital Library

[28]

Vissers, K. 2004. Programming models and architectures for FPGAs. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES'04). ACM, New York.

Digital Library

[29]

Wan, M., Ichikawa, Y., Lidsky, D., and Rabaey, J. 1998. An energy conscious methodology for early design space exploration of heterogeneous DSPs. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC'98). IEEE, Los Alamitos, CA, 111--117.

[30]

Xilinx, Inc. 2007. http://www.xilinx.com.

[31]

Xilinx, Inc. 2003. Xilinx Press Release &num;03142, http://www.xilinx.com/prs_rls/silicon_spart/03142s3_pricing.htm.

Cited By

Caraveo-Cacep MVázquez-Medina RHernández Zavala A(2024)A review on security implementations in soft-processors for IoT applicationsComputers and Security10.1016/j.cose.2023.103677139:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103677
Paulino NFerreira JCardoso J(2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
https://dl.acm.org/doi/10.1145/3369764
Sajjad MYusoff MAhmed M(2020)Design of Double-Precision Fully-Programmable Computational Unit for FPGA and ASIC2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)10.1109/iCCECE49321.2020.9231146(21-26)Online publication date: 17-Aug-2020
https://doi.org/10.1109/iCCECE49321.2020.9231146
Show More Cited By

Index Terms

Design and implementation of a MicroBlaze-based warp processor
1. Computer systems organization

Recommendations

Warp Processors

We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that ...
Warp Processors
DAC '04: Proceedings of the 41st annual Design Automation Conference

We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that ...
Dynamic FPGA routing for just-in-time FPGA compilation
DAC '04: Proceedings of the 41st annual Design Automation Conference

Just-in-time (JIT) compilation has previously been used in many applications to enable standard software binaries to execute on different underlying processor architectures. However, embedded systems increasingly incorporate Field Programmable Gate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 8, Issue 3

April 2009

239 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/1509288

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 22 April 2009

Accepted: 01 August 2007

Revised: 01 May 2007

Received: 01 February 2007

Published in TECS Volume 8, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
951
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Caraveo-Cacep MVázquez-Medina RHernández Zavala A(2024)A review on security implementations in soft-processors for IoT applicationsComputers and Security10.1016/j.cose.2023.103677139:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103677
Paulino NFerreira JCardoso J(2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
https://dl.acm.org/doi/10.1145/3369764
Sajjad MYusoff MAhmed M(2020)Design of Double-Precision Fully-Programmable Computational Unit for FPGA and ASIC2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE)10.1109/iCCECE49321.2020.9231146(21-26)Online publication date: 17-Aug-2020
https://doi.org/10.1109/iCCECE49321.2020.9231146
Wajszczyk B(2019)Analysis of using a MicroBlaze processor for hardware implementation of algorithms for data processing in electronic recognition devices and systems based on the example of a XILINX FPGA systemXII Conference on Reconnaissance and Electronic Warfare Systems10.1117/12.2525056(59)Online publication date: 27-Mar-2019
https://doi.org/10.1117/12.2525056
Najjar HBourguiba RMouine J(2018)A Reusable Hybrid RISC Processor with Programmable Instruction Set2018 15th International Multi-Conference on Systems, Signals & Devices (SSD)10.1109/SSD.2018.8570509(1028-1031)Online publication date: Mar-2018
https://doi.org/10.1109/SSD.2018.8570509
Paulino NFerreira JCardoso J(2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2573640
Cardoso JCoutinho JDiniz P(2017)Controlling the design and development cycleEmbedded Computing for High Performance10.1016/B978-0-12-804189-5.00003-X(57-98)Online publication date: 2017
https://doi.org/10.1016/B978-0-12-804189-5.00003-X
Najjar HBourguiba RMouine J(2016)A new programmable ALU architecture for hard-core processor2016 13th International Multi-Conference on Systems, Signals & Devices (SSD)10.1109/SSD.2016.7473751(567-570)Online publication date: Mar-2016
https://doi.org/10.1109/SSD.2016.7473751
Paulino NFerreira JBispo JCardoso JNebel WAtienza D(2015)Transparent acceleration of program execution using reconfigurable hardwareProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2757061(1066-1071)Online publication date: 9-Mar-2015
https://dl.acm.org/doi/10.5555/2755753.2757061
Jung LHochberger C(2015)Feasibility of high level compiler optimizations in online synthesis2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)10.1109/ReConFig.2015.7393310(1-7)Online publication date: Dec-2015
https://doi.org/10.1109/ReConFig.2015.7393310
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents