Skip to main content

Improving the Performance of Heterogeneous DSMs via Multithreading

  • Conference paper
  • First Online:
Vector and Parallel Processing — VECPAR 2000 (VECPAR 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1981))

Included in the following conference series:

  • 2699 Accesses

Abstract

This paper analyzes the impact of hardware multithreading support on the performance of distribute share -memory (DSM) multiprocessors built out of heterogeneous, single-chip computing nodes. Area-efficiency arguments motivate a heterogeneous, hierarchical organization (HDSM) consisting of few processors with extensive support for instruction-level parallelism an large caches, an a larger number of simpler processors with smaller caches for efficient execution of thread- parallel code. Such heterogeneous machine relies on the execution of multiple threads per processor to deliver high performance for unmoified applications. This paper quantitatively studies the performance of HDSMs for software-based an hardware-multithreade scenarios.The simulation-based experiments in this paper consider a 16-node multiprocessor, six homogeneous shared-memory benchmarks from the SPLASH- 2 suite, an a decision-support application (C4.5).Simulation results show that a hardware-based, block-multithreade HDSM configuration outperforms a software-multithreaded counterpart, on average, by 13%.

This work was partially funde by the National Science Foundation under grants CCR-9970728 an EIA-9975275.Renato Figueiredo is also supporte by a CAPES scholarship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ben-Miled, Z. and Fortes, J.A.B. A Heterogeneous Hierarchical Solution to Cost-efficient High Performance Computing. Par. and Dist. Processing Symp.,Oct 1996.

    Google Scholar 

  2. Boothe, B. and Ranade, A. Improved Multithreade Techniques for Hiding Communication Latency in Multiprocessors. In Proc. 19th International Symposium on Computer Architecture pages 214–223,1992.

    Google Scholar 

  3. Chrysos, George Z. and Emer, Joel S. Memory Dependence Prediction Using Store Sets. In Proc. 25th International Symposium on Computer Architecture pages 142–153,1998.

    Google Scholar 

  4. Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L.,and Tullsen, D.M.Simultaneous Multithreading:A Platform for Next-Generation Processors. IEEE Micro pages 12–19, Sep.1997.

    Google Scholar 

  5. B.A. Gieseke et al. A 600MHz Superscalar RISC Microprocessor with Out-of-Order Execution.In International Solid State Circuits Conference1997.

    Google Scholar 

  6. Figueiredo, R.J.O. and Fortes, J.A.B. Impact of Heterogeneity on DSM Performance. In Proc. 6 the International Symposium on High-Performance Computer Architecture pages 26–35, Jan 2000.

    Google Scholar 

  7. Gharachorloo, K., Lenoski, D., Laudon, J., Gibbons, P., Gupta, A.,and Hennessy, J. Memory Consistency and Event Ordering in Scalable Shared-memory Multi-processors.In Proc. 17th International Symposium Computer Architecture June 1990.

    Google Scholar 

  8. Hammond, L., Nayfeh, B.A.,and Olukotun, K. A Single-Chip Multiprocessor. IEEE Computer Sep.1997.

    Google Scholar 

  9. Kroft, D. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. 8th International Symposium on Computer Architecture 1981.

    Google Scholar 

  10. Pai, V.S., Ranganathan, P.,and Adve, S.V.The Impact of Instruction-Level Parallelism on Multiprocessor Performance an Simulation Methodology.In Proc. 3rd International Symposium on High-Performance Computer Architecture Feb 1997.

    Google Scholar 

  11. Papamarcos, M. and Patel, J. A Low Overhea Coherence Solution for Multi-processors with Private Cache Memories. In Proc. of 11th Annual Int. Symp. on Computer Architecture1984.

    Google Scholar 

  12. Patt, Y.N., Patel, S.J., Evers, M., Friendly,D.H.,and Stark, J.One Billion Transistors,One Uniprocessor,One Chip.IEEE Computer pages 51–57, Sep. 1997.

    Google Scholar 

  13. Quinlan, J.R.C4.5: Programs for Machine LearningMorgan Kaufmann,San Mateo,California,1993.

    Google Scholar 

  14. Reinman, G., Austin, T.,and Calder, B.A Scalable Front-end Architecture for Fast Instruction Delivery.In Proc. 26th International Symposium on Computer Architecture pages 234–245,1999.

    Google Scholar 

  15. Semiconductor Industry Association.The National Technology Roadmap for Semiconductors.San Jose,CA,1997.

    Google Scholar 

  16. Storino, S.N., Borkenhagen, J.M., Kalla, R.N.,and Kunkel, S.R. A Multi-Threade 64-bit PowerPC Commercial RISC Processor Design. In Hot Chips XI 1999.

    Google Scholar 

  17. Wilton, S.J.E. and Jouppi, N.P. An Enhance Access an Cycle Time Model for On-Chip Caches.Technical Report WRL Research Report93/5,Western Research Laboratory,Digital Equipment Corporation,1993.

    Google Scholar 

  18. Woo, S.C.et al. The SPLASH-2 Programs:Characterization an Methodological Considerations.In Proceedings of the 22nd Annual Int. Symp. on Computer Architecture July 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Figueiredo, R.J.O., Bradford, J.P., Fortes, J.A.B. (2001). Improving the Performance of Heterogeneous DSMs via Multithreading. In: Palma, J.M.L.M., Dongarra, J., Hernández, V. (eds) Vector and Parallel Processing — VECPAR 2000. VECPAR 2000. Lecture Notes in Computer Science, vol 1981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44942-6_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-44942-6_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41999-0

  • Online ISBN: 978-3-540-44942-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics