Improving the Performance of Heterogeneous DSMs via Multithreading

Figueiredo, Renato J. O.; Bradford, Jeffrey P.; Fortes, José A. B.

doi:10.1007/3-540-44942-6_13

Renato J. O. Figueiredo⁷,
Jeffrey P. Bradford⁷ &
José A. B. Fortes⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1981))

Included in the following conference series:

International Conference on Vector and Parallel Processing

2699 Accesses

Abstract

This paper analyzes the impact of hardware multithreading support on the performance of distribute share -memory (DSM) multiprocessors built out of heterogeneous, single-chip computing nodes. Area-efficiency arguments motivate a heterogeneous, hierarchical organization (HDSM) consisting of few processors with extensive support for instruction-level parallelism an large caches, an a larger number of simpler processors with smaller caches for efficient execution of thread- parallel code. Such heterogeneous machine relies on the execution of multiple threads per processor to deliver high performance for unmoified applications. This paper quantitatively studies the performance of HDSMs for software-based an hardware-multithreade scenarios.The simulation-based experiments in this paper consider a 16-node multiprocessor, six homogeneous shared-memory benchmarks from the SPLASH- 2 suite, an a decision-support application (C4.5).Simulation results show that a hardware-based, block-multithreade HDSM configuration outperforms a software-multithreaded counterpart, on average, by 13%.

This work was partially funde by the National Science Foundation under grants CCR-9970728 an EIA-9975275.Renato Figueiredo is also supporte by a CAPES scholarship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ben-Miled, Z. and Fortes, J.A.B. A Heterogeneous Hierarchical Solution to Cost-efficient High Performance Computing. Par. and Dist. Processing Symp.,Oct 1996.
Google Scholar
Boothe, B. and Ranade, A. Improved Multithreade Techniques for Hiding Communication Latency in Multiprocessors. In Proc. 19th International Symposium on Computer Architecture pages 214–223,1992.
Google Scholar
Chrysos, George Z. and Emer, Joel S. Memory Dependence Prediction Using Store Sets. In Proc. 25th International Symposium on Computer Architecture pages 142–153,1998.
Google Scholar
Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.,and Tullsen, D.M.Simultaneous Multithreading:A Platform for Next-Generation Processors. IEEE Micro pages 12–19, Sep.1997.
Google Scholar
B.A. Gieseke et al. A 600MHz Superscalar RISC Microprocessor with Out-of-Order Execution.In International Solid State Circuits Conference1997.
Google Scholar
Figueiredo, R.J.O. and Fortes, J.A.B. Impact of Heterogeneity on DSM Performance. In Proc. 6 the International Symposium on High-Performance Computer Architecture pages 26–35, Jan 2000.
Google Scholar
Gharachorloo, K., Lenoski, D., Laudon, J., Gibbons, P., Gupta, A.,and Hennessy, J. Memory Consistency and Event Ordering in Scalable Shared-memory Multi-processors.In Proc. 17th International Symposium Computer Architecture June 1990.
Google Scholar
Hammond, L., Nayfeh, B.A.,and Olukotun, K. A Single-Chip Multiprocessor. IEEE Computer Sep.1997.
Google Scholar
Kroft, D. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. 8th International Symposium on Computer Architecture 1981.
Google Scholar
Pai, V.S., Ranganathan, P.,and Adve, S.V.The Impact of Instruction-Level Parallelism on Multiprocessor Performance an Simulation Methodology.In Proc. 3rd International Symposium on High-Performance Computer Architecture Feb 1997.
Google Scholar
Papamarcos, M. and Patel, J. A Low Overhea Coherence Solution for Multi-processors with Private Cache Memories. In Proc. of 11th Annual Int. Symp. on Computer Architecture1984.
Google Scholar
Patt, Y.N., Patel, S.J., Evers, M., Friendly,D.H.,and Stark, J.One Billion Transistors,One Uniprocessor,One Chip.IEEE Computer pages 51–57, Sep. 1997.
Google Scholar
Quinlan, J.R.C4.5: Programs for Machine LearningMorgan Kaufmann,San Mateo,California,1993.
Google Scholar
Reinman, G., Austin, T.,and Calder, B.A Scalable Front-end Architecture for Fast Instruction Delivery.In Proc. 26th International Symposium on Computer Architecture pages 234–245,1999.
Google Scholar
Semiconductor Industry Association.The National Technology Roadmap for Semiconductors.San Jose,CA,1997.
Google Scholar
Storino, S.N., Borkenhagen, J.M., Kalla, R.N.,and Kunkel, S.R. A Multi-Threade 64-bit PowerPC Commercial RISC Processor Design. In Hot Chips XI 1999.
Google Scholar
Wilton, S.J.E. and Jouppi, N.P. An Enhance Access an Cycle Time Model for On-Chip Caches.Technical Report WRL Research Report93/5,Western Research Laboratory,Digital Equipment Corporation,1993.
Google Scholar
Woo, S.C.et al. The SPLASH-2 Programs:Characterization an Methodological Considerations.In Proceedings of the 22nd Annual Int. Symp. on Computer Architecture July 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

School of ECE, Purdue University, 47907, West Lafayette, IN, USA
Renato J. O. Figueiredo, Jeffrey P. Bradford & José A. B. Fortes

Authors

Renato J. O. Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey P. Bradford
View author publications
You can also search for this author in PubMed Google Scholar
José A. B. Fortes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
José M. L. M. Palma
Department of Computer Science, University of Tennessee, 37996-1301, Knoxville, TN, USA
Jack Dongarra
Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera, s/n, Apartado 22012, E-46020, Valencia, Spain
Vicente Hernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Figueiredo, R.J.O., Bradford, J.P., Fortes, J.A.B. (2001). Improving the Performance of Heterogeneous DSMs via Multithreading. In: Palma, J.M.L.M., Dongarra, J., Hernández, V. (eds) Vector and Parallel Processing — VECPAR 2000. VECPAR 2000. Lecture Notes in Computer Science, vol 1981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44942-6_13

Download citation

DOI: https://doi.org/10.1007/3-540-44942-6_13
Published: 11 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41999-0
Online ISBN: 978-3-540-44942-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics