skip to main content
10.1145/232973acmconferencesBook PagePublication PagesiscaConference Proceedingsconference-collections
ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
ACM1996 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ISCA96: International Conference on Computer Architecture Philadelphia Pennsylvania USA May 22 - 24, 1996
ISBN:
978-0-89791-786-5
Published:
15 May 1996
Sponsors:
SIGARCH, IEEE-CS\TCCA
Next Conference
Reflects downloads up to 08 Mar 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
Article
Free
Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches

Pipeline stalls due to conditional branches represent one of the most significant impediments to realizing the performance potential of deeply pipelined, superscalar processors. Many branch predictors have been proposed to help alleviate this problem, ...

Article
Free
An analysis of dynamic branch prediction schemes on system workloads

Recent studies of dynamic branch prediction schemes rely almost exclusively on user-only simulations to evaluate performance. We find that an evaluation of these schemes with user and kernel references often leads to different conclusions. By analyzing ...

Article
Free
Correlation and aliasing in dynamic branch predictors

Previous branch prediction studies have relied primarily upon the SPECint89 and SPECint92 benchmarks for evaluation. Most of these benchmarks exercise a very small amount of code. As a consequence, the resources required by these schemes for accurate ...

Article
Free
Decoupled hardware support for distributed shared memory

This paper investigates hardware support for fine-grain distributed shared memory (DSM) in networks of workstations. To reduce design time and implementation cost relative to dedicated DSM systems, we decouple the functional hardware components of DSM ...

Article
Free
MGS: a multigrain shared memory system

Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to ...

Article
Free
COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors

Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating node failures therefore becomes very important for these architectures particularly if ...

Article
Free
Evaluation of design alternatives for a multiprocessor microprocessor

In the future, advanced integrated circuit processing and packaging technology will allow for several design options for multiprocessor microprocessors. In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and ...

Article
Free
Memory bandwidth limitations of future microprocessors

This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a ...

Article
Free
Missing the memory wall: the case for processor/memory integration

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening ...

Article
Free
Don't use the page number, but a pointer to it

Most newly announced high performance microprocessors support 64-bit virtual addresses and the width of physical addresses is also growing. As a result, the size of the address tags in the L1 cache is increasing. The impact of on chip area is ...

Article
Free
The difference-bit cache

The difference-bit cache is a two-way set-associative cache with an access time that is smaller than that of a conventional one and close or equal to that of a direct-mapped cache. This is achieved by noticing that the two tags for a set have to differ ...

Article
Free
Understanding application performance on shared virtual memory systems

Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for ...

Article
Free
Application and architectural bottlenecks in large scale distributed shared memory machines

Many of the programming challenges encountered in small to moderate-scale hardware cache-coherent shared memory machines have been extensively studied. While work remains to be done, the basic techniques needed to efficiently program such machines have ...

Article
Free
Increasing cache port efficiency for dynamic superscalar microprocessors

The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single ...

Article
Free
High-bandwidth address translation for multiple-issue processors

In an effort to push the envelope of system performance, microprocessor designs are continually exploiting higher levels of instruction-level parallelism, resulting in increasing bandwidth demands on the address translation mechanism. Most current ...

Article
Free
DCD—disk caching disk: a new approach for boosting I/O performance

This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk, referred to as cache-disk, as a secondary disk cache to optimize ...

Article
Free
Polling watchdog: combining polling and interrupts for efficient message handling

Parallel systems supporting multithreading, or message passing in general, have typically used either polling or interrupts to handle incoming messages. Neither approach is ideal; either may lead to excessive overheads or message-handling latencies, ...

Article
Free
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized ...

Article
Free
Evaluation of multithreaded uniprocessors for commercial application environments

As memory speeds grow at a considerably slower rate than processor speeds, memory accesses are starting to dominate the execution time of processors, and this will likely continue into the future. This trend will be exacerbated by growing miss rates due ...

Article
Free
Performance comparison of ILP machines with cycle time evaluation

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to ...

Article
Free
Rotating combined queueing (RCQ): bandwidth and latency guarantees in low-cost, high-performance networks

Network service guarantees not only provide significant performance benefits to distributed computing systems (more balanced resource utilization, fast fault recovery, and fair network access), but they are also essential for many new applications ...

Article
Free
A router architecture for real-time point-to-point networks

Parallel machines have the potential to satisfy the large computational demands of emerging real-time applications. These applications require a predictable communication network, where time-constrained traffic requires bounds on latency or throughput ...

Article
Free
Coherent network interfaces for fine-grain communication

Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, however, makes it possible for processors and devices to interact with ...

Article
Free
Informing memory operations: providing memory performance feedback in modern processors

Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the ...

Article
Free
Instruction prefetching of systems codes with layout optimized for reduced cache misses

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large ...

Article
Free
Compiler and hardware support for cache coherence in large-scale multiprocessors: design considerations and performance study

In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. It can be adapted to various cache ...

Article
Free
Early experience with message-passing on the SHRIMP multicomputer

The SHRIMP multicomputer provides virtual memory-mapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that ...

Article
Free
STiNG: a CC-NUMA computer system for the commercial marketplace

"STiNG" is a Cache Coherent Non-Uniform Memory Access (CC-NUMA) Multiprocessor designed and built by Sequent Computer Systems, Inc. It combines four processor Symmetric Multi-processor (SMP) nodes (called Quads), using a Scalable Coherent Interface (SCI)...

Contributors
  • University of Washington

Recommendations

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%
YearSubmittedAcceptedRate
ISCA '224006717%
ISCA '193656217%
ISCA '173225417%
ISCA '132885619%
ISCA '122624718%
ISCA '082593714%
ISCA '062343113%
ISCA '051944523%
ISCA '042173114%
ISCA '031843620%
ISCA '021802715%
ISCA '011632415%
ISCA '991352619%
Overall3,20354317%