Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

Galanis, Michalis D.; Dimitroulakos, Gregory; Goutis, Costas E.

doi:10.1007/s11227-006-0007-2

Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

Published: 02 March 2007

Volume 39, pages 251–271, (2007)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Michalis D. Galanis¹,
Gregory Dimitroulakos¹ &
Costas E. Goutis¹

51 Accesses
1 Citation
Explore all metrics

Abstract

The speedups achieved in a generic microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates time critical code segments, called kernels, thereby increasing the overall performance. The data-path has been previously introduced by the authors and it is composed by Flexible Computational Components (FCCs) that can realize any two-level template of primitive operations. A design flow, integrating the automated coprocessor synthesis method, for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical exploration in respect to the type of the custom data-path and to the microprocessor architecture is performed. The kernel and the overall application speedups of six real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. Kernel speedups up to 155 are achieved that result in an average overall improvement of 2.78 with a small overhead in circuit area. The design flow achieved the acceleration of the applications near to theoretical bounds. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance while having smaller area-time products for the generated data-paths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

References

Corazao MR, Khalaf MA, Guerra LM, Potkonjak M, Rabaey JM (1996) Performance optimization using template mapping for datapath-intensive high-level synthesis. IEEE Trans CAD (TCAD) 15(2):877–888
Google Scholar
Cheung N, Parameswaran S, Henkel J (2003) INSIDE: instruction selection/identification & design exploration for extensible processors. In: IEEE/ACM ICCAD ‘03, 2003, pp 291–297
Atasu K, Pozzi L, Ienne P, (2003) Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proc of DAC, 2003, pp 256–261
Sun F, Ravi S, Raghunathan A, Jha NK (2002) Synthesis of custom processors based on extensible platforms. In: ACM/IEEE ICCAD ‘02, 2002, pp 641–648
Cong J, Fan Y, Han G, Zhang Z (2004) Application-specific instruction generation for configurable processor architectures. In: Proc. of the ACM international symposium on field-programmable gate arrays (FPGA ‘04), 2004, pp 183–189
Kastner R, Kaplan A, Memik SO, Bozorgadeh E (2002) Instruction generation for hybrid reconfigurable systems. ACM Trans Des Autom Electron Syst (TODAES) 7(4):605–627
Article Google Scholar
Galanis MD, Theodoridis G, Tragoudas S, Goutis CE (2006) A high performance data-path for synthesizing DSP kernels. IEEE Trans Comput-Aided Des Integr Circ Syst (TCAD) 25(6):1154–1163
Article Google Scholar
Catthoor F, De Man H, Geurts W, Vernalde S (1996) Accelerator data-path synthesis for high-throughput signal processing applications, Kluwer Academic
Maas E, Herrmann D, Ernst R, Ruffer P, Hasenzahl S, Seitz M (1997) A processor-coprocessor architecture for high end video applications. In: Proc of ICASSP 1997, 1997, vol 1, pp 595–598
Schreiber R, Aditya S, Mahlke S, Kathail V, Rau BR, Cronquist D, Sivaraman M (2002) PICO-NPA: high-level synthesis of nonprogrammable hardware accelerators. J VLSI Process 31(2):127–142
Article MATH Google Scholar
Hounsell B, Taylor R (2004) Co-processor synthesis: a new methodology for embedded software acceleration. In: Proc of DATE ’04, vol 1, 2004, pp 682–683
Stitt G, Vahid F, McGregor G, Einloth B (2005) Hardware/software partitioning of software binaries: a case study of H.264 decode. In: Proc of CODES+ISSS ’05, 2005, pp 285–290
Shee SL, Parameswaran S, Cheung N (2005) Novel architecture for loop acceleration: a case study. In: Proc of CODES+ISSS ’05, 2005, pp. 297–302
Callahan TJ, Hauser JR, Wawrzynek J (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69
Google Scholar
Villarreal J, Suresh D, Stitt G, Vahid F, Najjar W (2002) Improving software performance with configurable logic. Des Autom Embed Syst 7(4):325–339
Article MATH Google Scholar
ARM Corp (2006) www.arm.com
Synplify ASIC, Synplicity Inc (2006) www.synplicity.com
SUIF2 compiler infrastucture (2006) http://suif.stanford.edu/suif/suif2/index.html
Smith MD, Holloway G (2002) An introduction to machine SUIF and its portable libraries for analysis and optimization, Technical Report, Harvard University. http://www.eecs.harvard.edu/hube/research/machsuif.html
De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill
Crenshaw JW (2000) MATH toolkit for real-time programming. CMP Books
SimpleScalar LLC (2006) www.simplescalar.com
JPEG image compression (2006) www.jpeg.org
IEEE 802.11a Wireless LAN standard (2006) http://grouper.ieee.org/groups/802/11/
Kumar S, Pires L, Ponnuswamy S, Nanavati C, Golusky J, Vojta M, Wadi S, Pandalai D, Spaanenberg H (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: Proc of FPGA ’00, 2000, pp 126–134
Bister M, Taeymans Y, Cornelis J (1989) Automatic segmentation of cardiac mr images. In: Proc of computers in cardiology. IEEE Comput Soc Press, 1989, pp 215–218
UTDSP Suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html
Lee C, Potkonjak M, Mangione WH-S (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: 37th IEEE/ACM int symposium on microarchitecture

Download references

Author information

Authors and Affiliations

VLSI Design Laboratory, ECE Department, University of Patras, Patras, Greece
Michalis D. Galanis, Gregory Dimitroulakos & Costas E. Goutis

Authors

Michalis D. Galanis
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Dimitroulakos
View author publications
You can also search for this author in PubMed Google Scholar
Costas E. Goutis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michalis D. Galanis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galanis, M.D., Dimitroulakos, G. & Goutis, C.E. Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path. J Supercomput 39, 251–271 (2007). https://doi.org/10.1007/s11227-006-0007-2

Download citation

Published: 02 March 2007
Issue Date: March 2007
DOI: https://doi.org/10.1007/s11227-006-0007-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

Abstract

Access this article

Similar content being viewed by others

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

Abstract

Access this article

Similar content being viewed by others

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation