Abstract
This paper analyzes the impact of hardware multithreading support on the performance of distribute share -memory (DSM) multiprocessors built out of heterogeneous, single-chip computing nodes. Area-efficiency arguments motivate a heterogeneous, hierarchical organization (HDSM) consisting of few processors with extensive support for instruction-level parallelism an large caches, an a larger number of simpler processors with smaller caches for efficient execution of thread- parallel code. Such heterogeneous machine relies on the execution of multiple threads per processor to deliver high performance for unmoified applications. This paper quantitatively studies the performance of HDSMs for software-based an hardware-multithreade scenarios.The simulation-based experiments in this paper consider a 16-node multiprocessor, six homogeneous shared-memory benchmarks from the SPLASH- 2 suite, an a decision-support application (C4.5).Simulation results show that a hardware-based, block-multithreade HDSM configuration outperforms a software-multithreaded counterpart, on average, by 13%.
This work was partially funde by the National Science Foundation under grants CCR-9970728 an EIA-9975275.Renato Figueiredo is also supporte by a CAPES scholarship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ben-Miled, Z. and Fortes, J.A.B. A Heterogeneous Hierarchical Solution to Cost-efficient High Performance Computing. Par. and Dist. Processing Symp.,Oct 1996.
Boothe, B. and Ranade, A. Improved Multithreade Techniques for Hiding Communication Latency in Multiprocessors. In Proc. 19th International Symposium on Computer Architecture pages 214–223,1992.
Chrysos, George Z. and Emer, Joel S. Memory Dependence Prediction Using Store Sets. In Proc. 25th International Symposium on Computer Architecture pages 142–153,1998.
Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.,and Tullsen, D.M.Simultaneous Multithreading:A Platform for Next-Generation Processors. IEEE Micro pages 12–19, Sep.1997.
B.A. Gieseke et al. A 600MHz Superscalar RISC Microprocessor with Out-of-Order Execution.In International Solid State Circuits Conference1997.
Figueiredo, R.J.O. and Fortes, J.A.B. Impact of Heterogeneity on DSM Performance. In Proc. 6 the International Symposium on High-Performance Computer Architecture pages 26–35, Jan 2000.
Gharachorloo, K., Lenoski, D., Laudon, J., Gibbons, P., Gupta, A.,and Hennessy, J. Memory Consistency and Event Ordering in Scalable Shared-memory Multi-processors.In Proc. 17th International Symposium Computer Architecture June 1990.
Hammond, L., Nayfeh, B.A.,and Olukotun, K. A Single-Chip Multiprocessor. IEEE Computer Sep.1997.
Kroft, D. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. 8th International Symposium on Computer Architecture 1981.
Pai, V.S., Ranganathan, P.,and Adve, S.V.The Impact of Instruction-Level Parallelism on Multiprocessor Performance an Simulation Methodology.In Proc. 3rd International Symposium on High-Performance Computer Architecture Feb 1997.
Papamarcos, M. and Patel, J. A Low Overhea Coherence Solution for Multi-processors with Private Cache Memories. In Proc. of 11th Annual Int. Symp. on Computer Architecture1984.
Patt, Y.N., Patel, S.J., Evers, M., Friendly,D.H.,and Stark, J.One Billion Transistors,One Uniprocessor,One Chip.IEEE Computer pages 51–57, Sep. 1997.
Quinlan, J.R.C4.5: Programs for Machine LearningMorgan Kaufmann,San Mateo,California,1993.
Reinman, G., Austin, T.,and Calder, B.A Scalable Front-end Architecture for Fast Instruction Delivery.In Proc. 26th International Symposium on Computer Architecture pages 234–245,1999.
Semiconductor Industry Association.The National Technology Roadmap for Semiconductors.San Jose,CA,1997.
Storino, S.N., Borkenhagen, J.M., Kalla, R.N.,and Kunkel, S.R. A Multi-Threade 64-bit PowerPC Commercial RISC Processor Design. In Hot Chips XI 1999.
Wilton, S.J.E. and Jouppi, N.P. An Enhance Access an Cycle Time Model for On-Chip Caches.Technical Report WRL Research Report93/5,Western Research Laboratory,Digital Equipment Corporation,1993.
Woo, S.C.et al. The SPLASH-2 Programs:Characterization an Methodological Considerations.In Proceedings of the 22nd Annual Int. Symp. on Computer Architecture July 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Figueiredo, R.J.O., Bradford, J.P., Fortes, J.A.B. (2001). Improving the Performance of Heterogeneous DSMs via Multithreading. In: Palma, J.M.L.M., Dongarra, J., Hernández, V. (eds) Vector and Parallel Processing — VECPAR 2000. VECPAR 2000. Lecture Notes in Computer Science, vol 1981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44942-6_13
Download citation
DOI: https://doi.org/10.1007/3-540-44942-6_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41999-0
Online ISBN: 978-3-540-44942-3
eBook Packages: Springer Book Archive