research-article

Latency tolerance for throughput computing

Authors:

Chien-Ping Lu,

Brian KoAuthors Info & Claims

ICCAD '12: Proceedings of the International Conference on Computer-Aided Design

Pages 524 - 525

https://doi.org/10.1145/2429384.2429496

Published: 05 November 2012 Publication History

Get Access

Abstract

In Throughput Computing, the data can be processed independently with a substantial amount of threads running similar programs, referred to as kernels, or shaders for graphics specific workload. A Throughput Computing device, such as GPU, requires task latency tolerance to hold the context of the outstanding threads, and data latency tolerance to hold spaces for memory requests issued from the threads. The threads are grouped into thread groups. The register file and the associated number of outstanding thread groups should be sized according to the ratio of the computing resources to load/store units. Such a ratio should reflect the balance between ALU and load/store instructions of target workload.

Index Terms

Latency tolerance for throughput computing

Recommendations

Improving Latency Tolerance of Multithreading through Decoupling

The increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, ...
Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation
HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture

The performance of in-order execution Itanium(tm) processors can suffer signi ?cantly due to cache misses. Two memory latency tolerance approaches can be applied for the Itanium processors. One uses an out-of-order (OOO) execution core; the other ...
High-Performance Throughput Computing

Throughput computing, achieved through multithreading and multicore technology, can lead to performance improvements that are 10 to 30 those of conventional processors and systems. However, such systems should also offer good single-thread performance. ...

Comments

Information & Contributors

Information

Published In

ICCAD '12: Proceedings of the International Conference on Computer-Aided Design

November 2012

781 pages

ISBN:9781450315739

DOI:10.1145/2429384

General Chair:
Alan J. Hu
Univ. of British Columbia, Vancouver, BC, Canada

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICCAD '12

Sponsor:

SIGDA

ICCAD '12: The International Conference on Computer-Aided Design

November 5 - 8, 2012

California, San Jose

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Improving Latency Tolerance of Multithreading through Decoupling

Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation

High-Performance Throughput Computing

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

Index Terms

Recommendations

Improving Latency Tolerance of Multithreading through Decoupling

Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation

High-Performance Throughput Computing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations