Abstract
The speedups achieved in a generic microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates time critical code segments, called kernels, thereby increasing the overall performance. The data-path has been previously introduced by the authors and it is composed by Flexible Computational Components (FCCs) that can realize any two-level template of primitive operations. A design flow, integrating the automated coprocessor synthesis method, for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical exploration in respect to the type of the custom data-path and to the microprocessor architecture is performed. The kernel and the overall application speedups of six real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. Kernel speedups up to 155 are achieved that result in an average overall improvement of 2.78 with a small overhead in circuit area. The design flow achieved the acceleration of the applications near to theoretical bounds. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance while having smaller area-time products for the generated data-paths.
Similar content being viewed by others
References
Corazao MR, Khalaf MA, Guerra LM, Potkonjak M, Rabaey JM (1996) Performance optimization using template mapping for datapath-intensive high-level synthesis. IEEE Trans CAD (TCAD) 15(2):877–888
Cheung N, Parameswaran S, Henkel J (2003) INSIDE: instruction selection/identification & design exploration for extensible processors. In: IEEE/ACM ICCAD ‘03, 2003, pp 291–297
Atasu K, Pozzi L, Ienne P, (2003) Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proc of DAC, 2003, pp 256–261
Sun F, Ravi S, Raghunathan A, Jha NK (2002) Synthesis of custom processors based on extensible platforms. In: ACM/IEEE ICCAD ‘02, 2002, pp 641–648
Cong J, Fan Y, Han G, Zhang Z (2004) Application-specific instruction generation for configurable processor architectures. In: Proc. of the ACM international symposium on field-programmable gate arrays (FPGA ‘04), 2004, pp 183–189
Kastner R, Kaplan A, Memik SO, Bozorgadeh E (2002) Instruction generation for hybrid reconfigurable systems. ACM Trans Des Autom Electron Syst (TODAES) 7(4):605–627
Galanis MD, Theodoridis G, Tragoudas S, Goutis CE (2006) A high performance data-path for synthesizing DSP kernels. IEEE Trans Comput-Aided Des Integr Circ Syst (TCAD) 25(6):1154–1163
Catthoor F, De Man H, Geurts W, Vernalde S (1996) Accelerator data-path synthesis for high-throughput signal processing applications, Kluwer Academic
Maas E, Herrmann D, Ernst R, Ruffer P, Hasenzahl S, Seitz M (1997) A processor-coprocessor architecture for high end video applications. In: Proc of ICASSP 1997, 1997, vol 1, pp 595–598
Schreiber R, Aditya S, Mahlke S, Kathail V, Rau BR, Cronquist D, Sivaraman M (2002) PICO-NPA: high-level synthesis of nonprogrammable hardware accelerators. J VLSI Process 31(2):127–142
Hounsell B, Taylor R (2004) Co-processor synthesis: a new methodology for embedded software acceleration. In: Proc of DATE ’04, vol 1, 2004, pp 682–683
Stitt G, Vahid F, McGregor G, Einloth B (2005) Hardware/software partitioning of software binaries: a case study of H.264 decode. In: Proc of CODES+ISSS ’05, 2005, pp 285–290
Shee SL, Parameswaran S, Cheung N (2005) Novel architecture for loop acceleration: a case study. In: Proc of CODES+ISSS ’05, 2005, pp. 297–302
Callahan TJ, Hauser JR, Wawrzynek J (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69
Villarreal J, Suresh D, Stitt G, Vahid F, Najjar W (2002) Improving software performance with configurable logic. Des Autom Embed Syst 7(4):325–339
ARM Corp (2006) www.arm.com
Synplify ASIC, Synplicity Inc (2006) www.synplicity.com
SUIF2 compiler infrastucture (2006) http://suif.stanford.edu/suif/suif2/index.html
Smith MD, Holloway G (2002) An introduction to machine SUIF and its portable libraries for analysis and optimization, Technical Report, Harvard University. http://www.eecs.harvard.edu/hube/research/machsuif.html
De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill
Crenshaw JW (2000) MATH toolkit for real-time programming. CMP Books
SimpleScalar LLC (2006) www.simplescalar.com
JPEG image compression (2006) www.jpeg.org
IEEE 802.11a Wireless LAN standard (2006) http://grouper.ieee.org/groups/802/11/
Kumar S, Pires L, Ponnuswamy S, Nanavati C, Golusky J, Vojta M, Wadi S, Pandalai D, Spaanenberg H (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: Proc of FPGA ’00, 2000, pp 126–134
Bister M, Taeymans Y, Cornelis J (1989) Automatic segmentation of cardiac mr images. In: Proc of computers in cardiology. IEEE Comput Soc Press, 1989, pp 215–218
UTDSP Suite. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html
Lee C, Potkonjak M, Mangione WH-S (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: 37th IEEE/ACM int symposium on microarchitecture
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Galanis, M.D., Dimitroulakos, G. & Goutis, C.E. Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path. J Supercomput 39, 251–271 (2007). https://doi.org/10.1007/s11227-006-0007-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-0007-2