Design and Effectiveness of Small-Sized Decoupled Dispatch Queues

Ro, Won W.; Gaudiot, Jean-Luc

doi:10.1007/11823285_50

Won W. Ro¹⁹ &
Jean-Luc Gaudiot²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4128))

Included in the following conference series:

European Conference on Parallel Processing

841 Accesses

Abstract

Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues in modern superscalar microprocessors. However, such large queues are inevitably accompanied by high circuit complexity which correspondingly limits the pipeline clock rates. This is due to the fact that most of today’s designs are based upon a centralized dispatch queue which depends on globally broadcasting operations to wake up and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architecture model. Simulation results based on 14 data intensive benchmarks show that our DDQ (Decoupled Dispatch Queues) design achieves performance comparable to a superscalar machine with a large dispatch queue. We also show that our DDQ can be designed with small-sized, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.

This paper is based upon work supported in part by NSF grant CCF-0541403. Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily reflect the views of NSF.

Download to read the full chapter text

Chapter PDF

SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures

Runtime-Aware Architectures

REPLICA MBTAC: multithreaded dual-mode processor

Article 16 December 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective superscalar processors. In: Proceedings of the 24th Annual International Symposium on Computer Architecture (1997)
Google Scholar
Burger, D., Austin, T.: The simplescalar tool set. Technical Report CS-TR-97-1342, University of Wisconsin-Madison (1996)
Google Scholar
Farrens, M., Nico, P., Ng, P.: A comparison of superscalar and decoupled access/execute architectures. In: Proceedings of the 26th Annual International Symposium on Microarchitecture (1993)
Google Scholar
Goodman, J.R., Hsieh, J.T., Liou, K., Pleszkun, A.R., Schechter, P.B., Young, H.C.: PIPE: A vlsi decoupled architecture. In: Proceedings of the 12th Annual International Symposium on Computer Architecture (1985)
Google Scholar
Jones, G.P., Topham, N.P.: A comparison of data prefetching on an access decoupled and superscalar machine. In: Proceedings of the 30th Annual International Symposium on Microarchitecture (1997)
Google Scholar
Kurian, L., Hulina, P.T., Coraor, L.D.: Memory latency effects in decoupled architectures. IEEE Transactions on Computers 43(10) (1994)
Google Scholar
Smith, J.: Decoupled access/execute computer architecture. In: Proceedings of the 9th Annual International Symposium on Computer Architecture (1982)
Google Scholar
Tyson, G., Farrens, M., Pleszkun, A.: MISC: A multiple instruction stream computer. In: Proceedings of the 25th Annual International Symposium on Microarchitecture (1992)
Google Scholar
Wulf, W.A.: Evaluation of the WM architecture. In: Proceedings of the 19th Annual International Symposium on Computer Architecture (1992)
Google Scholar
Zhang, Y., Adams III, G.B.: Performance modeling and code partitioning for the DS architecture. In: Proceedings of the 25th Annual International Symposium on Computer Architecture (1998)
Google Scholar
Farkas, K.I., Chow, P., Jouppi, N.P., Vranesic, Z.: The multicluster architecture: Reducing cycle time through partitioning. In: Proceedings of the 30th Annual International Symposium on Microarchitecture (1997)
Google Scholar
Canal, R., Parcerisa, J.M., González, A.: Speculative data-driven multithreading. In: Proceedings of the 6th International Symposium on High Performance Computer Architecture (2000)
Google Scholar
Kemp, G.A., Franklin, M.: PEWs: A decentralized dynamic scheduler for ILP processing. In: Proceedings of the ICPP (1996)
Google Scholar
Krishnan, V., Torrellas, J.: A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers 48(9) (1999)
Google Scholar
Marcuello, P., González, A.: Clustered speculative multithreaded processors. In: Proceedings of the 13th International Conference on Supercomputing (1999)
Google Scholar
Ro, W.W., Gaudiot, J.L., Crago, S.P., Despain, A.M.: HiDISC: A decoupled architecture for data-intensive applications. In: Proceedings of the 17th IPDPS (2003)
Google Scholar
Bird, P., Rawsthorne, A., Topham, N.: The effectiveness of decoupling. In: Proceedings of the 7th International Conference on Supercomputing (1993)
Google Scholar
Collins, J.D., Wang, H., Tullsen, D.M., Hughes, C., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative precomputation: Long-range prefetching of delinquent loads. In: Proceedings of the 28th Annual International Symposium on Computer Architecture (2001)
Google Scholar
Roth, A., Sohi, G.S.: Speculative data-driven multithreading. In: Proceedings of the 7th International Symposium on High Performance Computer Architecture (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, California State University, Northridge
Won W. Ro
Department of Electrical Engineering and Computer Science, University of California, Irvine
Jean-Luc Gaudiot

Authors

Won W. Ro
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gaudiot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ZIH, TU Dresden, Germany
Wolfgang E. Nagel
Fakultät Mathematik, Institut für wissenschaftliches Rechnen, TU Dresden, 01062, Dresden, Germany
Wolfgang V. Walter
Database Technology Group, Technische Universität Dresden, Germany
Wolfgang Lehner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ro, W.W., Gaudiot, JL. (2006). Design and Effectiveness of Small-Sized Decoupled Dispatch Queues. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds) Euro-Par 2006 Parallel Processing. Euro-Par 2006. Lecture Notes in Computer Science, vol 4128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823285_50

Download citation

DOI: https://doi.org/10.1007/11823285_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37783-2
Online ISBN: 978-3-540-37784-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics