Skip to main content
Log in

Improving Adaptability and Per-Core Performance of Many-Core Processors Through Reconfiguration

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Increasing the number of cores in a multi-core processor can only be achieved by reducing the resources available in each core, and hence sacrificing the per-core performance. Furthermore, having a large number of homogeneous cores may not be effective for all the applications. For instance, threads with high instruction level parallelism will under-perform considerably in the resource-constrained cores. In this paper, we propose a core architecture that can be adapted to improve a single thread’s performance or to execute multiple threads. In particular, we integrate Reconfigurable Hardware Unit (RHU) in the resource-constrained cores of a many-core processor. The RHU can be reconfigured to execute the frequently encountered instructions from a thread in order to increase the core’s overall execution bandwidth, thus improving its performance. On the other hand, if the core’s resources are sufficient for a thread, then the RHU can be configured to executed instructions from a different thread to increase the thread level parallelism. The RHU has low area overhead, and hence has minimal impact on scalability of the number of cores. To further limit the area overhead of this mechanism, generation of the reconfiguration bits for the RHUs of multiple cores is delegated to a single core. In this paper, we present the results for using the RHU to improve a single thread’s performance. Our experiments show that the proposed architecture improves the per-core performance by an average of about 23% across a wide range of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Atasu, K. et al.: Automatic application-specific and instruction-set extensions under microarchitectural constraints. DAC (2003)

  2. Athanas, P. et al.: Processor reconfiguration through instruction-set metamorphosis. IEEE Computer, 26(3), (1995)

  3. Bracy, A. et al.: Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. Proc. MICRO (2004)

  4. Bracy, A. et al.: Serialization-Aware Mini-Graphs: Performance with Fewer Resources. Proc. MICRO (2006)

  5. Brisk, P. et al.: Instruction generation and regularity extraction for reconfigurable processors. CASES (2002)

  6. Burger, D., Austin, T.M.: The SimpleScalar Tool Set, Version 2.0. Computer Arch. News, June (1997)

  7. Callahan T. et al.: The garp architecture and c compiler. IEEE Comput. 33(4), 62–69 (2000)

    Google Scholar 

  8. Chou, Y. et al.: Piperench implementation of the instruction path coprocessor. Proc. MICRO (2000)

  9. Clark, N. et al.: An architecture framework for transparent instruction set customization in embedded processors. Proc. ISCA (2005)

  10. Clark, N. et al.: Processor acceleration through automated instruction-set customization. Proc. MICRO (2003)

  11. Clark, N. et al.: Application specific processing on a general purpose core via transparent instruction set customization. Proc. MICRO (2004)

  12. Corliss, M.L. et al.: DISE: A programmable macro engine for customizing applications. Proc. ISCA (2003)

  13. Fahs, B. et al.: Performance characterization of a hardware mechanism for dynamic optimization. Proc. MICRO (2001)

  14. Goodwin, D., Petkov, D.: Automatic generation of application specific processors. CASES (2003)

  15. Guthaus, M.R. et al.: MiBench: a free, commercially representative embedded benchmark suite. Work. Workload Characterization (2001)

  16. Hammond L. et al.: A single-chip multiprocessor. IEEE Comput. 30(9), 79–85 (1997)

    Google Scholar 

  17. Hauck, S. et al.: The chimaera reconfigurable functional unit. Proc. FCCM (1997)

  18. Hsu, S.K. et al.: An 8.3 GHz dual supply/threshold optimized 32b integer ALU-register file loop in 90 nm CMOS. ISLPED (2005)

  19. Hu, S. et al.: An approach for implementing efficient superscalar CISC processors. Proc. HPCA (2006)

  20. Hu, S., Smith, J.: Using dynamic binary translation to fuse dependent instructions. Int. Symp. on CGO (2004)

  21. Huang, I., Despain, A.M.: Synthesis of application specific instruction sets. IEEE TCAD (1995)

  22. Huang, Z. et al.: Design of dynamically reconfigurable datapath processors. ACM Trans. on Embedded Computing Systems. 3(2) (2004)

  23. Iseli C., Sanchez E.: Spyder: a sure (superscalar and reconfigurable) processor. J. Supercomput. 9(3), 231–252 (1995)

    Article  Google Scholar 

  24. Intel corporation, mobile intel pentium 4 m-processor datasheet, Jun. 2003. http://www.intel.com/design/mobile/datashts/250686.htm

  25. Jacob, J.A., Chow, P.: Memory interfacing an instruction specification for reconfigurable processors. Symp. FPGAs (1999)

  26. Kastrup, B. et al.: Concise: a compiler-driven cpld-based instruction set accelerator. Proc. FCCM (1999)

  27. Kim, I., Lipasti, M.: Macro-op scheduling: relaxing scheduling loop constraints. Proc. MICRO (2003)

  28. Kumar, R. et al.: Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling. Proc. ISCA (2005)

  29. Lee, C., et al.: MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. Proc. MICRO (1997)

  30. Miyamori T., Olukotun K.: Remarc: reconfigurable multimedia array co-processor. IEICE Trans. Inf. Syst. E82-D(2), 389–397 (1999)

    Google Scholar 

  31. Olukotun, K. et al.: The case for a single-chip multiprocessor. ASPLOS-VII (1996)

  32. Sun Microsystems, Inc.: OpenSPARC T1 micro architecture specification. Sun Microsystems, Inc (2006)

  33. Palacharla, S. et al.: Complexity-effective superscalar processors. Proc. ISCA (1997)

  34. Razdan, R., Smith, M.: A high-performance microarchitecture with hardware-programmable functional units. Proc. MICRO (1994)

  35. Renau, J. et al.: (2005) SESC Simulator. http://sesc.sourceforge.net

  36. Rupp, C.R. et al.: The napa adaptive processing architecture. Proc. FPGAs for computing machines (1998)

  37. Sassone, P., Wills, D.: Dynamic strands: collapsing speculative dependence chains for reducing pipeline communication. Proc. MICRO (2004)

  38. Singh H. et al.: Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49(5), 465–481 (2000)

    Article  Google Scholar 

  39. Sun, F. et al.: Synthesis of custom processors based on extensible platforms. ICCAD (2002)

  40. TSMC 90 nm Core Library—TCBN90GHP, Application note—Revision 1.2 (2006)

  41. Vassiliadis, S. et al.: The molen polymorphic processor. IEEE Trans. Comput. 53(11) (2004)

  42. Wittig, R., Chow, P.: Onechip: an fpga processor with reconfigurable logic. Proc. FCCM (1996)

  43. Wong, S. et al.: Coarse reconfigurable multimedia unit extension. Proc. 9th Euromicro workshop on Parallel and Distributed Processing (1996)

  44. Ye, Z. et al.: A c compiler for a processor with a reconfigurable functional unit. Proc. Symp. on FPGAs (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tameesh Suri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suri, T., Aggarwal, A. Improving Adaptability and Per-Core Performance of Many-Core Processors Through Reconfiguration. Int J Parallel Prog 38, 203–224 (2010). https://doi.org/10.1007/s10766-010-0128-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-010-0128-3

Keywords

Navigation