Skip to main content

Advertisement

Log in

Performance and Energy Consumption Improvements in Microprocessor Systems Utilizing a Coprocessor Data-Path

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The speedups and the energy reductions achieved in a generic single-chip microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates computational intensive kernel sections thereby increasing the overall performance. The authors have previously introduced the data-path which is composed by flexible computational components (FCCs). These components can realize any two-level sequence of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. The overall application speedups of eleven real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. These speedups are close to theoretical bounds and range from 1.78 to 5.84, having an average value of 3.04, while the overhead in circuit area is small. The energy savings range from 41 to 74%, while the reduction in the application energy-delay product has an average value of 80%. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance, consumes less energy and has smaller area–time products for the generated data-paths. Additionally, the FCC data-path achieves better performance in accelerating kernels relative to a VLIW DSP core.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. M.R. Corazao, M.A. Khalaf, L.M. Guerra, M. Potkonjak and J.M. Rabaey, “Performance Optimization Using Template Mapping for Datapath-Intensive High-Level Synthesis,” in IEEE Trans. on CAD, vol. 15, no. 2, 1996, pp. 877–888, August.

  2. P. Marwedel, B. Landwehr and R. Domer, “Built-in Chaining: Introducing Complex Components into Architectural Synthesis”, in Proc. Of ASPDAC ’97, 1997, pp. 599–605.

  3. N. Cheung, S. Parameswaran and J. Henkel, “INSIDE: Instruction Selection/Identification and Design Exploration for Extensible Processors”, in IEEE/ACM ICCAD ‘03, 2003, pp. 291–297.

  4. K. Atasu, L. Pozzi and P. Ienne, “Automatic Application-Specific Instruction-Set Extensions Under Microarchitectural Constraints”, in Proc. Of DAC, 2003, pp. 256–261.

  5. F. Sun, S. Ravi, A. Raghunathan and N.K. Jha, “Synthesis of Custom Processors Based on Extensible Platforms”, in ACM/IEEE ICCAD ‘02, 2002, pp. 641–648.

  6. J. Cong, Y. Fan, G. Han and Z. Zhang, “Application-Specific Instruction Generation for Configurable Processor Architectures”, in Proc. of the ACM International Symposium on Field-Programmable Gate Arrays (FPGA ‘04), 2004, pp. 183–189.

  7. R. Kastner, A. Kaplan, S.O. Memik and E. Bozorgadeh, “Instruction Generation for Hybrid Reconfigurable Systems”, in ACM Trans. on Design Automation of Electronic Systems (TODAES), vol. 7, no. 4, 2002, pp. 605–627, October.

  8. M. D. Galanis, G. Theodoridis, S. Tragoudas and C. E. Goutis, “A High Performance Data-Path for Synthesizing DSP Kernels”, in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 25, no. 6, 2006, pp. 1154–1163, June.

  9. F. Catthoor, H. De Man, W. Geurts and S. Vernalde, “Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications,” Springer, 1996.

  10. E. Maas, D. Herrmann, R. Ernst, P. Ruffer, S. Hasenzahl and M. Seitz, “A Processor–Coprocessor Architecture for High End Video Applications”, in Proc. of ICASSP ‘97, vol. 1, 1997, pp. 595–598.

  11. R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B.R. Rau, D. Cronquist and M. Sivaraman, “PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators”, in the Journal of VLSI Processing, Springer, vol. 31, no. 2, 2002, pp. 127–142.

  12. B. Hounsell and R. Taylor, “Co-Processor Synthesis: A New Methodology for Embedded Software Acceleration”, in Proc. of DATE ’04, vol. 1, 2004, pp. 682–683.

  13. G. Stitt, F. Vahid, G. McGregor and B. Einloth, “Hardware/Software Partitioning of Software Binaries: A Case Study of H.264 Decode”, in Proc. of CODES+ISSS ’05, 2005, pp. 285–290.

  14. S.L. Shee, S. Parameswaran and N. Cheung, “Novel Architecture for Loop Acceleration: A Case Study”, in Proc. of CODES+ISSS ’05, 2005, pp. 297–302.

  15. T.J. Callahan, J.R. Hauser and J. Wawrzynek, “The Garp Architecture and C Compiler,” IEEE Comput., vol. 33, no. 4, 2000, pp. 62–69, April.

    Google Scholar 

  16. G. Stitt and F. Vahid, “Energy Advantages of Microprocessors Platforms with On-Chip Configurable Logic”, IEEE D&T Comput., vol. 19, no. 6, 2002, pp. 36–43, Nov.–Dec.

    Article  Google Scholar 

  17. I. Kuon and J. Rose, “Measuring the Gap Between FPGAs and ASICs”, in Proc. of the ACM International Symposium on FPGA, 2006, pp. 21–30.

  18. H. Singh, M.-H. Lee, G. Lu, F.J. Kurdahi, N. Bagherzadeh and E.M. Chaves Filho, “MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Communication-Intensive Applications”, IEEE Trans. Comput., vol. 49, no. 5, 2000, pp. 465–481, May.

    Article  Google Scholar 

  19. C. Ebeling, C. Fisher, G. Xing, M. Shen and H. Liu, “Implementing an OFDM Receiver on the RaPid Reconfigurable Architecture”, IEEE Trans. Comput., vol. 53, no. 11, 2004, pp. 1438–1448, November.

    Article  Google Scholar 

  20. SUIF2 Compiler Infrastucture, http://suif.stanford.edu/suif/suif2/index.html, 2006.

  21. M.D. Smith and G. Holloway, “An Introduction to Machine SUIF and Its Portable Libraries for Analysis and Optimization”, Technical Report, Harvard University, 2002.

  22. G. De Micheli, “Synthesis and Optimization of Digital Circuits,” McGraw-Hill, 1994.

  23. J.W. Crenshaw, “MAth Toolkit for Real-Time Programming,” CMP Books, 2000.

  24. ARM Corporation, http://www.arm.com, 2006.

  25. Synopsys Inc., http://www.synopsys.com, 2006.

  26. JPEG Image Compression, http://www.jpeg.org, 2006.

  27. IEEE 802.11a Wireless LAN standard, http://grouper.ieee.org/groups/802/11/, 2006.

  28. S. Kumar, L. Pires, S. Ponnuswamy, C. Nanavati, J. Golusky, M. Vojta, S. Wadi, D. Pandalai and H. Spaanenberg, “A Benchmark Suite for Evaluating Configurable Computing Systems—Status, Reflections, and Future directions,” in Proc. of FPGA ’00, 2000, pp. 126–134.

  29. M. Bister, Y. Taeymans and J. Cornelis, “Automatic Segmentation of Cardiac MR Images”, in Proc. of Computers in Cardiology, IEEE Computer Society Press, 1989, pp. 215–218.

  30. UTDSP Suite, http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html.

  31. C. Lee, M. Potkonjak and W.H. Mangione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems”, in 37th IEEE/ACM Int. Symposium on Microarchitecture, 1997.

  32. M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, R.B. Brown, “MiBench: A free, commercially representative embedded benchmark suite”, in Proc. of IEEE Workshop on Workload Characterization, 2001, pp. 3–14, Dec.

  33. Texas Instruments, http://www.ti.com, 2006.

  34. D.G. Chinnery and K. Keutzer, “Closing the Power Gap between ASIC and Custom: An ASIC Perspective”, in Proc. Of Design Automation Conference (DAC), 2005, pp. 275–280.

  35. H. Veendrick, “Deep-Submicron CMOS ICs–From Basics to ASICs,” Springer, 2000.

  36. Intel StrongARM 1110 Processor, http://www.intel.com.

  37. Artisan Components, http://www.artisan.com.

  38. F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P.G. Kjeldsberg, T. Van Achteren and T. Omnes, “Data Accesses and Storage Management for Embedded Programmable Processors,” Springer, 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michalis D. Galanis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galanis, M.D., Dimitroulakos, G. & Goutis, C.E. Performance and Energy Consumption Improvements in Microprocessor Systems Utilizing a Coprocessor Data-Path. J Sign Process Syst Sign Image 50, 179–200 (2008). https://doi.org/10.1007/s11265-007-0097-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0097-y

Keywords

Navigation