Skip to main content
Log in

Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The speedups achieved in a generic microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates time critical code segments, called kernels, thereby increasing the overall performance. The data-path has been previously introduced by the authors and it is composed by Flexible Computational Components (FCCs) that can realize any two-level template of primitive operations. A design flow, integrating the automated coprocessor synthesis method, for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical exploration in respect to the type of the custom data-path and to the microprocessor architecture is performed. The kernel and the overall application speedups of six real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. Kernel speedups up to 155 are achieved that result in an average overall improvement of 2.78 with a small overhead in circuit area. The design flow achieved the acceleration of the applications near to theoretical bounds. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance while having smaller area-time products for the generated data-paths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Corazao MR, Khalaf MA, Guerra LM, Potkonjak M, Rabaey JM (1996) Performance optimization using template mapping for datapath-intensive high-level synthesis. IEEE Trans CAD (TCAD) 15(2):877–888

    Google Scholar 

  2. Cheung N, Parameswaran S, Henkel J (2003) INSIDE: instruction selection/identification & design exploration for extensible processors. In: IEEE/ACM ICCAD ‘03, 2003, pp 291–297

  3. Atasu K, Pozzi L, Ienne P, (2003) Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proc of DAC, 2003, pp 256–261

  4. Sun F, Ravi S, Raghunathan A, Jha NK (2002) Synthesis of custom processors based on extensible platforms. In: ACM/IEEE ICCAD ‘02, 2002, pp 641–648

  5. Cong J, Fan Y, Han G, Zhang Z (2004) Application-specific instruction generation for configurable processor architectures. In: Proc. of the ACM international symposium on field-programmable gate arrays (FPGA ‘04), 2004, pp 183–189

  6. Kastner R, Kaplan A, Memik SO, Bozorgadeh E (2002) Instruction generation for hybrid reconfigurable systems. ACM Trans Des Autom Electron Syst (TODAES) 7(4):605–627

    Article  Google Scholar 

  7. Galanis MD, Theodoridis G, Tragoudas S, Goutis CE (2006) A high performance data-path for synthesizing DSP kernels. IEEE Trans Comput-Aided Des Integr Circ Syst (TCAD) 25(6):1154–1163

    Article  Google Scholar 

  8. Catthoor F, De Man H, Geurts W, Vernalde S (1996) Accelerator data-path synthesis for high-throughput signal processing applications, Kluwer Academic

  9. Maas E, Herrmann D, Ernst R, Ruffer P, Hasenzahl S, Seitz M (1997) A processor-coprocessor architecture for high end video applications. In: Proc of ICASSP 1997, 1997, vol 1, pp 595–598

  10. Schreiber R, Aditya S, Mahlke S, Kathail V, Rau BR, Cronquist D, Sivaraman M (2002) PICO-NPA: high-level synthesis of nonprogrammable hardware accelerators. J VLSI Process 31(2):127–142

    Article  MATH  Google Scholar 

  11. Hounsell B, Taylor R (2004) Co-processor synthesis: a new methodology for embedded software acceleration. In: Proc of DATE ’04, vol 1, 2004, pp 682–683

  12. Stitt G, Vahid F, McGregor G, Einloth B (2005) Hardware/software partitioning of software binaries: a case study of H.264 decode. In: Proc of CODES+ISSS ’05, 2005, pp 285–290

  13. Shee SL, Parameswaran S, Cheung N (2005) Novel architecture for loop acceleration: a case study. In: Proc of CODES+ISSS ’05, 2005, pp. 297–302

  14. Callahan TJ, Hauser JR, Wawrzynek J (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69

    Google Scholar 

  15. Villarreal J, Suresh D, Stitt G, Vahid F, Najjar W (2002) Improving software performance with configurable logic. Des Autom Embed Syst 7(4):325–339

    Article  MATH  Google Scholar 

  16. ARM Corp (2006) www.arm.com

  17. Synplify ASIC, Synplicity Inc (2006) www.synplicity.com

  18. SUIF2 compiler infrastucture (2006) http://suif.stanford.edu/suif/suif2/index.html

  19. Smith MD, Holloway G (2002) An introduction to machine SUIF and its portable libraries for analysis and optimization, Technical Report, Harvard University. http://www.eecs.harvard.edu/hube/research/machsuif.html

  20. De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill

  21. Crenshaw JW (2000) MATH toolkit for real-time programming. CMP Books

  22. SimpleScalar LLC (2006) www.simplescalar.com

  23. JPEG image compression (2006) www.jpeg.org

  24. IEEE 802.11a Wireless LAN standard (2006) http://grouper.ieee.org/groups/802/11/

  25. Kumar S, Pires L, Ponnuswamy S, Nanavati C, Golusky J, Vojta M, Wadi S, Pandalai D, Spaanenberg H (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: Proc of FPGA ’00, 2000, pp 126–134

  26. Bister M, Taeymans Y, Cornelis J (1989) Automatic segmentation of cardiac mr images. In: Proc of computers in cardiology. IEEE Comput Soc Press, 1989, pp 215–218

  27. UTDSP Suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html

  28. Lee C, Potkonjak M, Mangione WH-S (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: 37th IEEE/ACM int symposium on microarchitecture

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michalis D. Galanis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galanis, M.D., Dimitroulakos, G. & Goutis, C.E. Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path. J Supercomput 39, 251–271 (2007). https://doi.org/10.1007/s11227-006-0007-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-006-0007-2

Keywords

Navigation