Skip to main content
Log in

Dynamic Code Partitioning for Clustered Architectures

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Recent works1 show that delays introduced in the issue and bypass logic will become critical for wide issue superscalar processors. One of the proposed solutions is clustering the processor core. Clustered architectures benefit from a less complex partitioned processor core and thus, incur in less critical delays. In this paper, we propose a dynamic instruction steering logic for these clustered architectures that decides at decode time the cluster where each instruction is executed. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses runtime information to optimize the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int + 4 fp) machine and that it outperforms other previous proposals, either static or dynamic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. S. Palacharla, N. P. Jouppi, and J. E. Smith, Complexity-effective superscalar processors, Proc. 24th Int'l. Symp. on Comp. Architecture, pp. 1-13 (June 1997).

  2. S. Palacharla and J. E. Smith, Decoupling integer execution in superscalar processors, Proc. 28th Ann. Symp. on Microarchitecture, pp. 285-290 (November 1995).

  3. S. S. Sastry, S. Palacharla, and J. E. Smith, Exploiting idle floating-point resources for integer execution, Proc. Int'l. Conf. Progr. Lang. Design and Implementation, pp. 118-129 (June 1998).

  4. K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic, The multicluster architecture: Reducing cycle time through partitioning, Proc. 30th Ann. Symp. on Microarchitecture, pp. 149-159 (December 1997).

  5. G. A. Kemp and M. Franklin, PEWs: A decentralized dynamic scheduler for ILP process-ing, Proc. of the Int'l. Conf. on Parallel Processing, pp. 239-246 (August 1996).

  6. L. Gwennap, Digital 21264 sets new standard, Microprocessor Report 10(14):11-16 (October 1996).

    Google Scholar 

  7. D. Burger, T. M. Austin, and S. Bennett, Evaluating future microprocessors: The Simple-Scalar tool set, Technical Report CS-TR-96-1308, University of Wisconsin-Madison (1996).

  8. Standard Performance Evaluation Corporation, SPEC Newsletter (September 1995).

  9. C. Lee, M. Potkonjak, and W. H. Mangione-Smith, Mediabench: A tool for evaluating and synthesizing multimedia and communications systems, Proc. IEEE-ACM Int'l. Symp. on Microarchitecture (MICRO 30), pp. 330-335 (December 1997).

  10. D. Matzke, Will physical scalability sabotage performance gains, IEEE Computer 30(9): 37-39 (September 1997).

    Google Scholar 

  11. R. Canal, J. M. Parcerisa, and A. Gonzalez, Dynamic cluster assignment mechanisms, Proc. Sixth Int'l. Symp. on High Performance Comp. Arch., pp. 133-142 (January 2000).

  12. K. I. Farkas, Memory-System Design Considerations for Dynamically-Scheduled Micro-processors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Canada (January 1997).

    Google Scholar 

  13. J. E. Smith, Decoupled acces-execute computer architectures, ACM Trans. Computer Syst. 2(4):289-308 (November 1984).

    Google Scholar 

  14. M. Franklin, The multiscalar architecture, Ph.D. thesis, Technical Report TR 1196, Computer Sciences Department, University of Wisconsin-Madison (1993).

  15. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, Multiscalar processors, Proc. 22nd Int'l. Symp. on Computer Architecture, pp. 414-425 (June 1995).

  16. E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. E. Smith, Trace processors, Proc. 30th Ann. Symp. on Microarchitectuer, pp. 138-148 (December 1997).

  17. S. Vajapeyam and T. Mitra, Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences, Proc. Int'l. Symp. on Computer Architecture, pp. 1-12 (June 1997).

  18. P. Marcuello and A. González, Clustered speculative multithreaded processors, Proc. 13th ACM Int'l. Conf. on Supercomputing, pp. 365-372 (June 1999).

  19. M. M. Fernandes, J. Llosa and N. Topham, Distributed modulo scheduling,Proc. Fifth Int'l. Symp. on High Performance Computer Architecture, pp. 130-134 (January 1999).

  20. E. Nystrom and A. E. Eichenberger, Effective cluster assignment for modulo scheduling, Proc. 31st Ann. Symp. on Microarchitecture, pp. 103-114 (1998).

  21. L. Gwennap, Intel's MMX speeds multimedia instructions, Microprocessor Report 10(3):1 (March 1996).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramon Canal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canal, R., Parcerisa, JM. & González, A. Dynamic Code Partitioning for Clustered Architectures. International Journal of Parallel Programming 29, 59–79 (2001). https://doi.org/10.1023/A:1026483904675

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026483904675

Navigation