Abstract
Recent works1 show that delays introduced in the issue and bypass logic will become critical for wide issue superscalar processors. One of the proposed solutions is clustering the processor core. Clustered architectures benefit from a less complex partitioned processor core and thus, incur in less critical delays. In this paper, we propose a dynamic instruction steering logic for these clustered architectures that decides at decode time the cluster where each instruction is executed. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses runtime information to optimize the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int + 4 fp) machine and that it outperforms other previous proposals, either static or dynamic.
Similar content being viewed by others
REFERENCES
S. Palacharla, N. P. Jouppi, and J. E. Smith, Complexity-effective superscalar processors, Proc. 24th Int'l. Symp. on Comp. Architecture, pp. 1-13 (June 1997).
S. Palacharla and J. E. Smith, Decoupling integer execution in superscalar processors, Proc. 28th Ann. Symp. on Microarchitecture, pp. 285-290 (November 1995).
S. S. Sastry, S. Palacharla, and J. E. Smith, Exploiting idle floating-point resources for integer execution, Proc. Int'l. Conf. Progr. Lang. Design and Implementation, pp. 118-129 (June 1998).
K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic, The multicluster architecture: Reducing cycle time through partitioning, Proc. 30th Ann. Symp. on Microarchitecture, pp. 149-159 (December 1997).
G. A. Kemp and M. Franklin, PEWs: A decentralized dynamic scheduler for ILP process-ing, Proc. of the Int'l. Conf. on Parallel Processing, pp. 239-246 (August 1996).
L. Gwennap, Digital 21264 sets new standard, Microprocessor Report 10(14):11-16 (October 1996).
D. Burger, T. M. Austin, and S. Bennett, Evaluating future microprocessors: The Simple-Scalar tool set, Technical Report CS-TR-96-1308, University of Wisconsin-Madison (1996).
Standard Performance Evaluation Corporation, SPEC Newsletter (September 1995).
C. Lee, M. Potkonjak, and W. H. Mangione-Smith, Mediabench: A tool for evaluating and synthesizing multimedia and communications systems, Proc. IEEE-ACM Int'l. Symp. on Microarchitecture (MICRO 30), pp. 330-335 (December 1997).
D. Matzke, Will physical scalability sabotage performance gains, IEEE Computer 30(9): 37-39 (September 1997).
R. Canal, J. M. Parcerisa, and A. Gonzalez, Dynamic cluster assignment mechanisms, Proc. Sixth Int'l. Symp. on High Performance Comp. Arch., pp. 133-142 (January 2000).
K. I. Farkas, Memory-System Design Considerations for Dynamically-Scheduled Micro-processors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Toronto, Canada (January 1997).
J. E. Smith, Decoupled acces-execute computer architectures, ACM Trans. Computer Syst. 2(4):289-308 (November 1984).
M. Franklin, The multiscalar architecture, Ph.D. thesis, Technical Report TR 1196, Computer Sciences Department, University of Wisconsin-Madison (1993).
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, Multiscalar processors, Proc. 22nd Int'l. Symp. on Computer Architecture, pp. 414-425 (June 1995).
E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. E. Smith, Trace processors, Proc. 30th Ann. Symp. on Microarchitectuer, pp. 138-148 (December 1997).
S. Vajapeyam and T. Mitra, Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences, Proc. Int'l. Symp. on Computer Architecture, pp. 1-12 (June 1997).
P. Marcuello and A. González, Clustered speculative multithreaded processors, Proc. 13th ACM Int'l. Conf. on Supercomputing, pp. 365-372 (June 1999).
M. M. Fernandes, J. Llosa and N. Topham, Distributed modulo scheduling,Proc. Fifth Int'l. Symp. on High Performance Computer Architecture, pp. 130-134 (January 1999).
E. Nystrom and A. E. Eichenberger, Effective cluster assignment for modulo scheduling, Proc. 31st Ann. Symp. on Microarchitecture, pp. 103-114 (1998).
L. Gwennap, Intel's MMX speeds multimedia instructions, Microprocessor Report 10(3):1 (March 1996).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Canal, R., Parcerisa, JM. & González, A. Dynamic Code Partitioning for Clustered Architectures. International Journal of Parallel Programming 29, 59–79 (2001). https://doi.org/10.1023/A:1026483904675
Issue Date:
DOI: https://doi.org/10.1023/A:1026483904675