Skip to main content

Advertisement

Log in

Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple-exits custom instruction and intuition behind it are introduced. Conditional execution capability has been added to the RFU to support the multi-exit feature of custom instructions. Because the proposed RFU has limitations on hardware resources (i.e., connections and processing elements), an integrated mapping-temporal partitioning framework is proposed to guarantee that the generated custom instructions can be mapped on the RFU (mappable custom instructions). Experimental results show that multi-exit custom instructions enhance the performance and energy efficiency by an average of 32% and 3% compared to custom instructions limited to one basic block, respectively. A maximum speedup of 4.9, compared to a single-issue embedded processor, and an average speedup of 1.9 was achieved on MiBench benchmark suite. The maximum and average energy saving are 56% and 22%, respectively. These performance and energy efficiency are obtained at the cost of 30% area overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alomary A, Nakata T, Honma Y, Sato J, Hikichi N, Imai M (1993) PEAS-I: A hardware/software co-design system for ASIPs. In: Euro-DAC, pp 2–7

    Google Scholar 

  2. Arnold M, Corporaal H (2001) Designing domain specific processors. In: Proceedings of the 9th international workshop on hardware/software codesign, pp 61–66

    Chapter  Google Scholar 

  3. Atasu K, Pozzi L, Ienne P (2003) Automatic application-specific instruction-set extension under microarchitectural constraints. In: Design automation conference, pp 256–261

    Google Scholar 

  4. Baleani M, Gennari F, Jiang Y, Patel Y, Brayton R, Sangiovanni-Vincentelli A (2002) HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform. In: 10th international symposium on hardware/software codesign, pp 151–156

    Chapter  Google Scholar 

  5. Barat F, Jayapala M, Vander AaT, Lauwereins R, Deconinck G, Corporaal H (2003) Low-power coarse-grained reconfigurable instruction set processor. In: Field-programmable logic and applications, pp 230–239

    Chapter  Google Scholar 

  6. Biswas P, Dutt N, Ienne P, Pozzi L (2006) Automatic identification of application-specific functional units with architecturally visible storage. Proc Des Autom Test Eur 1:1–6

    Article  Google Scholar 

  7. Brisk P, Kaplan A, Kastner R, Sarrafzadeh M (2002) Instruction generation and regularity extraction for reconfigurable processors. In: CASES, pp 262–269

    Google Scholar 

  8. Carrillo JE, Chow P (2001) The effect of reconfigurable units in superscalar processors. In: ACM/SIGDA on field programmable gate arrays, pp 141–150

    Google Scholar 

  9. Clark N, Zhong H, Mahlke S (2003) Processor acceleration through automated instruction set customization. In: The 36th international symposium on microarchitecture, pp 129–140

    Google Scholar 

  10. Clark N, Kudlur M, Park H, Mahlke S, Flautner K (2004) Application-specific processing on a general-purpose core via transparent instruction set customization. In: The 37th international symposium on microarchitecture, pp 30–40

    Chapter  Google Scholar 

  11. Clark N, Blome J, Chu M, Mahlke S, Biles S, Flautner K (2005) An architecture framework for transparent instruction set customization in embedded processors. In: International symposium on computer architecture, pp 272–283

    Google Scholar 

  12. Dougherty WE, Pursley DJ, Thomas DE (1999) Subsetting behavioral intellectual property for low power ASIP design. J VLSI Signal Process 209–218

  13. Furuyama T (2007) Challenges of digital consumer and mobile SoC’s: more Moore possible. In: Keynote address, design automation and test in Europe (DATE).

    Google Scholar 

  14. Goodwin D, Petkov D (2003) Automatic generation of application specific processors. In: International conference on compilers, architecture, and synthesis for embedded systems, pp 137–147

    Chapter  Google Scholar 

  15. Hauck S, Fry T, Hosler M, Kao J (1997) The Chimaera reconfigurable functional unit. In: Proc IEEE symposium FPGAS for custom computing machines, pp 87–96

    Google Scholar 

  16. Kastner R, Kaplan A, Ogrenci Memic S, Bozorgzadeh E (2002) Instruction generation for hybrid reconfigurable systems. ACM Trans Des Automat Embedd Syst 604–627

  17. Khan SU, Ahmad I (2009) A cooperative game theoretical technique for joint optimization of energy consumption and response time in computational grids. IEEE Trans Parallel Distrib Syst 21(4):537–553

    MathSciNet  Google Scholar 

  18. Lodi A, Toma M, Campi F, Cappelli A, Guerrieri R (2003) A VLIW processor with reconfigurable instruction set for embedded applications. IEEE J Solid-State Circuits 38(11):1876–1886

    Article  Google Scholar 

  19. Lysecky R, Vahid F (2005) A study of the speedups and competitiveness of FPGA soft processor cores using dynamic hardware/software partitioning. In: DATE, pp 18–23

    Google Scholar 

  20. Mehdipour F, Noori H, Saheb Zamani M, Murakami K, Sedighi M, Inoue K (2006) An integrated temporal partitioning and mapping framework for handling custom instructions on a reconfigurable functional unit. In: The eleventh Asia–Pacific computer systems architecture conference (ACSAC’06). Lecture notes in computer science, vol 4186, pp 219–230

    Google Scholar 

  21. Mehdipour F, Noori H, Saheb Zamani M, Inoue K, Murakami K (2007) Improving performance and energy saving in a reconfigurable processor via accelerating control data flow graphs. IEICE Trans Inf Syst E90-D(12)

  22. Mehdipour F, Saheb Zamani M, Sedighi M (2006) An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems. Microprocess Microsyst 30:52–62

    Article  Google Scholar 

  23. Mei B, Vernalde S, Verkest D, Lauwereinsg R (2004) Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study. In: Proc design automation and test in Europe, pp 90–101

    Google Scholar 

  24. Mibench, www.eecs.umich.edu/mibench

  25. Noori H, Mehdipour F, Murakami K, Inoue K, Saheb Zamani M (2008) An architecture framework for an adaptive extensible processor. J Supercomput (online edition)

  26. Noori H, Mehdipour F, Inoue K, Murakami K (2008) A reconfigurable functional unit with conditional execution for multi-exit custom instructions. IEICE Trans Electron E91-C(4):497–508

    Article  Google Scholar 

  27. Patel S, Lumetta S (2001) rePLay: A hardware framework for dynamic optimization. IEEE Trans Comput 50(6):590–608

    Article  Google Scholar 

  28. Praet JV, Goossens G, Lanneer D, Man HD (1994) Instruction set definition and instruction selection for ASIP. In: Intl symp on system synthesis

    Google Scholar 

  29. Rao DS, Kurdahi FJ (1993) On clustering for maximal regularity extraction. IEEE Trans Computer Aided Des 12(8):1198–1208

    Article  Google Scholar 

  30. Razdan R, Smith M (1994) A high-performance microarchitecture with hardware-programmable functional units. In: The 27th international symposium on microarchitecture, pp 172–180

    Chapter  Google Scholar 

  31. Sakurai T (2007) Meeting with the forthcoming IC design. Keynote address, ASP-DAC 2007

  32. Semenov O et al (2003) Burn-in temperature projections for deep sub-micro technologies. In: International test conference

    Google Scholar 

  33. Simplescalar, www.simplescalar.com

  34. Stitt G, Lysecky R, Vahid F (2004) Energy savings and speedups from partitioning critical software loops to hardware in embedded systems. ACM Trans Embedd Comput Syst 250–255

  35. Sun F, Ravi S, Raghunathan A, Jha NK (2002) Synthesis of custom processors based on extensible platforms. In: ICCAD 2002, vol 23, pp 216–228

    Google Scholar 

  36. Sun F, Ravi S, Raghunathan A, Jha NK (2004) Custom instruction synthesis for extensible-processor platforms. IEEE Trans Computer-Aided Des Integrat Circuits Syst 23:216–228

    Article  Google Scholar 

  37. Synopsys, www.synopsys.com

  38. Tarjan D, Thoziyoor S, Jouppi NP (2006) Cacti 4.0, HP laboratories, Technical report

  39. Vassiliadis S, Wong S, Gaydadjiev G, Bertels K, Kuzmanov G, Panainte EM (2004) The MOLEN polymorphic processor. IEEE Trans Comput 53(11):1363–1375

    Article  Google Scholar 

  40. Wan M, Zhang H, George V, Benes M, Abnous A, Prabhu V, Rabaey J (2001) Design methodology of a low-energy reconfigurable single-chip DSP system. J VLSI Signal Process 47–61

  41. Warp Processors, http://www.cs.ucr.edu/~vahid/warp/

  42. Weisstein W Graph isomorphism. http://mathworld.wolfram.com/GraphIsomorphism.html

  43. Wong S, Vassiliadis S, Cotofana S (2004) Future directions of programmable and reconfigurable embedded processors. In: Domain-specific processors: systems, architectures, modeling, and simulation

    Google Scholar 

  44. Yu P, Mitra T (2004) Characterizing embedded applications for instruction-set extensible processors. In: Design automation conference, pp 723–728

    Google Scholar 

  45. Zhang C, Vahid F, Najjar W (2005) A highly configurable cache architecture for embedded systems. ACM Trans Embed Comput Syst 4(2):136–146

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Noori.

Additional information

A first version of this work appeared in Design Automation and Test in Europe (DATE), 2007 under the title “Generating and Executing Multi-Exit Custom Instructions for an Adaptive Extensible Processor” and International Symposium on Low Power Electronics and Design (ISLPED), 2008 under the title “Enhancing Energy Efficiency of Processor-Based Embedded Systems through Post-Fabrication ISA Extension”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Noori, H., Mehdipour, F., Inoue, K. et al. Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization. J Supercomput 60, 196–222 (2012). https://doi.org/10.1007/s11227-010-0505-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-010-0505-0

Keywords

Navigation