Abstract
In this paper we propose a new separation of the processor units to avoid interunits communications for instruction dispatch, memory accesses and control flow computations. The motivation comes from the increasing importance of interchip signalling delays. The technique consists in separating the instruction set into types, e.g. integer, floating point and graphic, and the die into corresponding units, each including a private pc, an instruction cache, a prediction unit, a branch unit, a load/store unit and a data cache. Every type is able to fully handle data and pointer computations as well as typed address pointers. Hence the integer machine, the floating point one and the graphical one are very independent machines requiring no inter-machine communications. We justify our proposal by showing that the main communication paths can be highly reduced in length. We show that the fetch path length can be divided by 2, the data load path length can be decreased of 1/3 and computation units interconnection paths can be highly simplified, serving only for conversion purpose.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
http://www.amd.com/products/cpg/athlon/pdf/architecturewp.pdf
Steven J. E. Wilton and Norman P. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Compaq WRL Research Report 93/5, July 1994. http://www.research.digital.com/wrl/techreports/abstracts/93.5.html http://research.compaq.com/wrl/people/jouppi/cacti2.pdf
W.J. Dally and J.W. Poulton: Digital systems engineering. CUP 1998.
J. Edmondson et al.: Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor. Digital Technical Journal, vol. 7, no. 1:119–135.
Richard E. Kessler: The Alpha 21264 microprocessor. IEEE Micro, 19(2), March–April 1999, 24–36.
K. Farkas et al.: The multicluster architecture. Reducing cycle time through partitioning. Micro30, 1997
C.T. Gray, W. Liu and R.K. Cain III: Wave pipelining: theory and CMOS implementation. Kluwer academic publishers, Norwell, 1993.
ftp://www.hotchips.org/pub/hot7to11cd/hc98/pdf1up/hc981ajohnson1up.pdf
K. Olukotun, B. Nayfeh, L. Hammond, K. Wilson, K. Chang: The case for a single-chip multiprocessor. ASPLOS7, 1996.
S. Palacharla and N. Jouppi: Complexity-effective superscalar processors. ISCA24, 1997
http://developer.intel.com/technology/itj/q21999/articles/art_2.htm
ftp://download.intel.com/pentium4/download/netburstdetail.pdf
K.K. Sundararaman, M. Franklin: Multiscalar execution along a single flow of control. ICPP 1997.
D.M. Tullsen, S.J. Eggers, H.M. Levy: Simultaneous multithreading: maximizing on-chip parallelism. ISCA22, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goossens, B. (2001). Typing the ISA to Cluster the Processor. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2001. Lecture Notes in Computer Science, vol 2127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44743-1_23
Download citation
DOI: https://doi.org/10.1007/3-540-44743-1_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42522-9
Online ISBN: 978-3-540-44743-6
eBook Packages: Springer Book Archive