Abstract
Simultaneous multithreading is a recently proposed technique in which instructions from multiple threads are dispatched and/or issued concurrently in every clock cycle. This technique has been claimed to improve the latency of multithreaded programs and the throughput of multiprogrammed workloads with a minimal increase in hardware complexity. This paper presents a realistic study on the case for simultaneous multithreading by using extensive simulations to determine balanced configurations of a multithreaded version of the PowerPC 620, measuring their performance on multithreaded benchmarks written using the commercial P Threads API, and estimating their hardware complexity in terms of increases in die area. Our results show that a balanced 2- threaded 620 achieves a 41.6% to 71.3% speedup over the original 620 on five multithreaded benchmarks with an estimated 36.4% increase in die area and no impact on single thread performance. The balanced 4-threaded 620 achieves a 46.9% to 111.6% speedup over the original 620 with an estimated 70.4% increase in die area and a detrimental impact on single thread performance.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
D. Levitan. T. Thomas, and P. Tu, “The PowerPC 620 Microprocessor: A High Performance Superscalar RISC Microprocessor”, in Spring CompCon 95 Proceedings, pages 285–291, 1995.
T. A. Diep, C. Nelson and J. P. Shen, “Performance Evaluation of the PowerPC 620 Microarchitecture”, in Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 163–175, 1995.
J. M. Cohn, D. J. Garrod, R. A. Rutenbar, and L. R. Carley, “KOAN/ANAGRAM II: New Tools for Device-Level Analog Placement and Routing”, in IEEE Journal of Solid-State Circuits, Vol. 26, No. 3, March 1991.
J. Boykin, D. Kirschen, A. Langerman, and S. LoVerso, “Programming Under Mach”, Addison-Wesley, 1993.
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations”, in Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 22–36, 1995.
H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa, “An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads”, in Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 136–145, 1992.
M. Gulati and N. Bagherzadeh, “Performance Study of a Multithreaded Superscalar Microprocessor”, in Second International Symposium on High-Performance Computer Architecture, pages 291–301, 1996.
M. Loikkanen and N. Bagherzadeh, “A Fine-Grain Multithreading Superscalar Architecture”, in Proceedings of PACT '96, pages 163–168, 1996.
D. M. Tullsen, S. J. Eggers, and H. M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism”, in Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, 1995
D. M. Tullsen, S. J. Eggers, J. S. Enter, H. M. Levy, J. L. Lo, and R. L. Stamm, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor”, in Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 191–202, 1996.
G. E. Daddis and H. C. Torng, “The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors”, in International Conference on Parallel Processing, pages 176-83, 1991.
R. G. Prasadh and C. Wu, “A Benchmark Evaluation of a Multi-Threaded RISC Processor Architecture”, in International Conference on Parallel Processing, pages 184-91, 1991.
S. W. Keckler and W. J. Dally, “Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism”, in Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 202–213, 1992
M. Bekerman, A. Mendelson, and G. Sheaffer, “Performance and Hardware Complexity Trade-offs in Designing Multithreaded Architectures”, in Proceedings of PACT '96, pages 24–34, 1996.
T. M. Conte, K. N. Menezes, P. M. Mills, and B. Patel, “Optimization of Instruction Fetch Mechanisms for High Issue Rates”, in Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 333–344, 1995. *** DIRECT SUPPORT *** A0008C42 00038
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chou, Y.C., Siewiorek, D.P., Shen, J.P. (1997). A realistic study on multithreaded superscalar processor design. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds) Euro-Par'97 Parallel Processing. Euro-Par 1997. Lecture Notes in Computer Science, vol 1300. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0002858
Download citation
DOI: https://doi.org/10.1007/BFb0002858
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63440-9
Online ISBN: 978-3-540-69549-3
eBook Packages: Springer Book Archive