Proceedings of the 23rd annual international symposium on Computer architecture

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
- Download citation
- Copy citation

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

Go to Proceedings of the 23rd annual international symposium on Computer architecture

May 1996

1996 Proceeding

Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

ISCA96: International Conference on Computer Architecture Philadelphia Pennsylvania USA May 22 - 24, 1996

ISBN:

978-0-89791-786-5

Published:

15 May 1996

Sponsors:

SIGARCH, IEEE-CS\TCCA

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Next Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

ISCA '25 website

Reflects downloads up to 08 Mar 2025Bibliometrics

Citation Count

2,695

Downloads (6 weeks)

623

Downloads (12 months)

5,030

Downloads (cumulative)

27,524

Sections

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

1996

Previous Next

Abstract

No abstract available.

Skip Table Of Content Section

Select All

Export Citations Save to Binder

Article

Free

Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches

Marius Evers,
Po-Yung Chang,
Yale N. Patt

Pages 3–11https://doi.org/10.1145/232973.232975

Pipeline stalls due to conditional branches represent one of the most significant impediments to realizing the performance potential of deeply pipelined, superscalar processors. Many branch predictors have been proposed to help alleviate this problem, ...

- 129
- 1,742
Metrics
Total Citations129
Total Downloads1,742
Last 12 Months389
Last 6 weeks35

Abstract
View online with eReader
PDF

Article

Free

An analysis of dynamic branch prediction schemes on system workloads

Nicolas Gloy,
Cliff Young,
J. Bradley Chen,
Michael D. Smith

Pages 12–21https://doi.org/10.1145/232973.232977

Recent studies of dynamic branch prediction schemes rely almost exclusively on user-only simulations to evaluate performance. We find that an evaluation of these schemes with user and kernel references often leads to different conclusions. By analyzing ...

- 41
- 787
Metrics
Total Citations41
Total Downloads787
Last 12 Months130
Last 6 weeks10

Abstract
View online with eReader
PDF

Article

Free

Correlation and aliasing in dynamic branch predictors

Stuart Sechrest,
Chih-Chieh Lee,
Trevor Mudge

Pages 22–32https://doi.org/10.1145/232973.232978

Previous branch prediction studies have relied primarily upon the SPECint89 and SPECint92 benchmarks for evaluation. Most of these benchmarks exercise a very small amount of code. As a consequence, the resources required by these schemes for accurate ...

- 66
- 827
Metrics
Total Citations66
Total Downloads827
Last 12 Months169
Last 6 weeks18

Abstract
View online with eReader
PDF

Article

Free

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt,
Robert W. Pfile,
David A. Wood

Pages 34–43https://doi.org/10.1145/232973.232979

This paper investigates hardware support for fine-grain distributed shared memory (DSM) in networks of workstations. To reduce design time and implementation cost relative to dedicated DSM systems, we decouple the functional hardware components of DSM ...

- 72
- 678
Metrics
Total Citations72
Total Downloads678
Last 12 Months173
Last 6 weeks20

Abstract
View online with eReader
PDF

Article

Free

MGS: a multigrain shared memory system

Donald Yeung,
John Kubiatowicz,
Anant Agarwal

Pages 44–55https://doi.org/10.1145/232973.232980

Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to ...

- 79
- 586
Metrics
Total Citations79
Total Downloads586
Last 12 Months202
Last 6 weeks24

Abstract
View online with eReader
PDF

Article

Free

COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors

Christine Morin,
Alain Gefflaut,
Michel Banâtre,
Anne-Marie Kermarrec

Pages 56–65https://doi.org/10.1145/232973.232981

Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating node failures therefore becomes very important for these architectures particularly if ...

- 23
- 687
Metrics
Total Citations23
Total Downloads687
Last 12 Months201
Last 6 weeks19

Abstract
View online with eReader
PDF

Article

Free

Evaluation of design alternatives for a multiprocessor microprocessor

Basem A. Nayfeh,
Lance Hammond,
Kunle Olukotun

Pages 67–77https://doi.org/10.1145/232973.232982

In the future, advanced integrated circuit processing and packaging technology will allow for several design options for multiprocessor microprocessors. In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and ...

- 89
- 925
Metrics
Total Citations89
Total Downloads925
Last 12 Months109
Last 6 weeks12

Abstract
View online with eReader
PDF

Article

Free

Memory bandwidth limitations of future microprocessors

Doug Burger,
James R. Goodman,
Alain Kägi

Pages 78–89https://doi.org/10.1145/232973.232983

This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a ...

- 325
- 2,341
Metrics
Total Citations325
Total Downloads2,341
Last 12 Months431
Last 6 weeks77

Abstract
View online with eReader
PDF

Article

Free

Missing the memory wall: the case for processor/memory integration

Ashley Saulsbury,
Fong Pong,
Andreas Nowatzyk

Pages 90–101https://doi.org/10.1145/232973.232984

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening ...

- 153
- 1,551
Metrics
Total Citations153
Total Downloads1,551
Last 12 Months258
Last 6 weeks21

Abstract
View online with eReader
PDF

Article

Free

Don't use the page number, but a pointer to it

André Seznec

Pages 104–113https://doi.org/10.1145/232973.232985

Most newly announced high performance microprocessors support 64-bit virtual addresses and the width of physical addresses is also growing. As a result, the size of the address tags in the L1 cache is increasing. The impact of on chip area is ...

- 17
- 641
Metrics
Total Citations17
Total Downloads641
Last 12 Months135
Last 6 weeks18

Abstract
View online with eReader
PDF

Article

Free

The difference-bit cache

Toni Juan,
Tomás Lang,
Juan J. Navarro

Pages 114–120https://doi.org/10.1145/232973.232986

The difference-bit cache is a two-way set-associative cache with an access time that is smaller than that of a conventional one and close or equal to that of a direct-mapped cache. This is achieved by noticing that the two tags for a set have to differ ...

- 36
- 719
Metrics
Total Citations36
Total Downloads719
Last 12 Months142
Last 6 weeks23

Abstract
View online with eReader
PDF

Article

Free

Understanding application performance on shared virtual memory systems

Liviu Iftode,
Jaswinder Pal Singh,
Kai Li

Pages 122–133https://doi.org/10.1145/232973.232987

Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for ...

- 50
- 651
Metrics
Total Citations50
Total Downloads651
Last 12 Months115
Last 6 weeks15

Abstract
View online with eReader
PDF

Article

Free

Application and architectural bottlenecks in large scale distributed shared memory machines

Chris Holt,
Jaswinder Pal Singh,
John Hennessy

Pages 134–145https://doi.org/10.1145/232973.232988

Many of the programming challenges encountered in small to moderate-scale hardware cache-coherent shared memory machines have been extensively studied. While work remains to be done, the basic techniques needed to efficiently program such machines have ...

- 32
- 610
Metrics
Total Citations32
Total Downloads610
Last 12 Months107
Last 6 weeks7

Abstract
View online with eReader
PDF

Article

Free

Increasing cache port efficiency for dynamic superscalar microprocessors

Kenneth M. Wilson,
Kunle Olukotun,
Mendel Rosenblum

Pages 147–157https://doi.org/10.1145/232973.232989

The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single ...

- 76
- 635
Metrics
Total Citations76
Total Downloads635
Last 12 Months136
Last 6 weeks22

Abstract
View online with eReader
PDF

Article

Free

High-bandwidth address translation for multiple-issue processors

Todd M. Austin,
Gurindar S. Sohi

Pages 158–167https://doi.org/10.1145/232973.232990

In an effort to push the envelope of system performance, microprocessor designs are continually exploiting higher levels of instruction-level parallelism, resulting in increasing bandwidth demands on the address translation mechanism. Most current ...

- 36
- 634
Metrics
Total Citations36
Total Downloads634
Last 12 Months155
Last 6 weeks23

Abstract
View online with eReader
PDF

Article

Free

DCD—disk caching disk: a new approach for boosting I/O performance

Yiming Hu,
Qing Yang

Pages 169–178https://doi.org/10.1145/232973.232991

This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk, referred to as cache-disk, as a secondary disk cache to optimize ...

- 100
- 1,013
Metrics
Total Citations100
Total Downloads1,013
Last 12 Months174
Last 6 weeks25

Abstract
View online with eReader
PDF

Article

Free

Polling watchdog: combining polling and interrupts for efficient message handling

Olivier Maquelin,
Guang R. Gao,
Herbert H. J. Hum,
Kevin B. Theobald,
Xin-Min Tian

Pages 179–188https://doi.org/10.1145/232973.232992

Parallel systems supporting multithreading, or message passing in general, have typically used either polling or interrupts to handle incoming messages. Neither approach is ideal; either may lead to excessive overheads or message-handling latencies, ...

- 53
- 1,144
Metrics
Total Citations53
Total Downloads1,144
Last 12 Months358
Last 6 weeks48

Abstract
View online with eReader
PDF

Article

Free

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Dean M. Tullsen,
Susan J. Eggers,
Joel S. Emer,
Henry M. Levy,
Jack L. Lo,
Rebecca L. Stamm

Pages 191–202https://doi.org/10.1145/232973.232993

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized ...

- 692
- 4,467
Metrics
Total Citations692
Total Downloads4,467
Last 12 Months318
Last 6 weeks42

Abstract
View online with eReader
PDF

Article

Free

Evaluation of multithreaded uniprocessors for commercial application environments

Richard J. Eickemeyer,
Ross E. Johnson,
Steven R. Kunkel,
Mark S. Squillante,
Shiafun Liu

Pages 203–212https://doi.org/10.1145/232973.232994

As memory speeds grow at a considerably slower rate than processor speeds, memory accesses are starting to dominate the execution time of processors, and this will likely continue into the future. This trend will be exacerbated by growing miss rates due ...

- 42
- 763
Metrics
Total Citations42
Total Downloads763
Last 12 Months93
Last 6 weeks11

Abstract
View online with eReader
PDF

Article

Free

Performance comparison of ILP machines with cycle time evaluation

Tetsuya Hara,
Hideki Ando,
Chikako Nakanishi,
Masao Nakaya

Pages 213–224https://doi.org/10.1145/232973.232995

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to ...

- 16
- 511
Metrics
Total Citations16
Total Downloads511
Last 12 Months98
Last 6 weeks13

Abstract
View online with eReader
PDF

Article

Free

Rotating combined queueing (RCQ): bandwidth and latency guarantees in low-cost, high-performance networks

Jae H. Kim,
Andrew A. Chien

Pages 226–236https://doi.org/10.1145/232973.232996

Network service guarantees not only provide significant performance benefits to distributed computing systems (more balanced resource utilization, fast fault recovery, and fair network access), but they are also essential for many new applications ...

- 46
- 730
Metrics
Total Citations46
Total Downloads730
Last 12 Months132
Last 6 weeks24

Abstract
View online with eReader
PDF

Article

Free

A router architecture for real-time point-to-point networks

Jennifer Rexford,
John Hall,
Kang G. Shin

Pages 237–246https://doi.org/10.1145/232973.232998

Parallel machines have the potential to satisfy the large computational demands of emerging real-time applications. These applications require a predictable communication network, where time-constrained traffic requires bounds on latency or throughput ...

- 46
- 796
Metrics
Total Citations46
Total Downloads796
Last 12 Months101
Last 6 weeks14

Abstract
View online with eReader
PDF

Article

Free

Coherent network interfaces for fine-grain communication

Shubhendu S. Mukherjee,
Babak Falsafi,
Mark D. Hill,
David A. Wood

Pages 247–258https://doi.org/10.1145/232973.232999

Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, however, makes it possible for processors and devices to interact with ...

- 59
- 703
Metrics
Total Citations59
Total Downloads703
Last 12 Months185
Last 6 weeks18

Abstract
View online with eReader
PDF

Article

Free

Informing memory operations: providing memory performance feedback in modern processors

Mark Horowitz,
Margaret Martonosi,
Todd C. Mowry,
Michael D. Smith

Pages 260–270https://doi.org/10.1145/232973.233000

Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the ...

- 92
- 661
Metrics
Total Citations92
Total Downloads661
Last 12 Months117
Last 6 weeks13

Abstract
View online with eReader
PDF

Article

Free

Instruction prefetching of systems codes with layout optimized for reduced cache misses

Chun Xia,
Josep Torrellas

Pages 271–282https://doi.org/10.1145/232973.233001

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large ...

- 22
- 640
Metrics
Total Citations22
Total Downloads640
Last 12 Months136
Last 6 weeks20

Abstract
View online with eReader
PDF

Article

Free

Compiler and hardware support for cache coherence in large-scale multiprocessors: design considerations and performance study

Lynn Choi,
Pen-Chung Yew

Pages 283–294https://doi.org/10.1145/232973.233002

In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. It can be adapted to various cache ...

- 22
- 620
Metrics
Total Citations22
Total Downloads620
Last 12 Months102
Last 6 weeks14

Abstract
View online with eReader
PDF

Article

Free

Early experience with message-passing on the SHRIMP multicomputer

Edward W. Felten,
Richard D. Alpert,
Angelos Bilas,
Matthias A. Blumrich,
Douglas W. Clark,
Stefanos N. Damianakis,
Cezary Dubnicki,
Liviu Iftode,
Kai Li

Pages 296–307https://doi.org/10.1145/232973.233004

The SHRIMP multicomputer provides virtual memory-mapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that ...

- 39
- 591
Metrics
Total Citations39
Total Downloads591
Last 12 Months195
Last 6 weeks14

Abstract
View online with eReader
PDF

Article

Free

STiNG: a CC-NUMA computer system for the commercial marketplace

Tom Lovett,
Russell Clapp

Pages 308–317https://doi.org/10.1145/232973.233006

"STiNG" is a Cache Coherent Non-Uniform Memory Access (CC-NUMA) Multiprocessor designed and built by Sequent Computer Systems, Inc. It combines four processor Symmetric Multi-processor (SMP) nodes (called Quads), using a Scalable Coherent Interface (SCI)...

- 242
- 871
Metrics
Total Citations242
Total Downloads871
Last 12 Months169
Last 6 weeks23

Abstract
View online with eReader
PDF

Save to Binder

Create a New Binder

Name

Contributors

Jean Loup Baer
University of Washington
- Publication Years1968 - 2009
- Publication counts61
- Citation count2,420
- Available for Download47
- Downloads (cumulative)34,703
- Downloads (12 months)4,352
- Downloads (6 weeks)537
- Average Downloads per Article738
- Average Citation per Article40
View Full Profile

Index Terms

Proceedings of the 23rd annual international symposium on Computer architecture

Comments

Recommendations

CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)
LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Year	Submitted	Accepted	Rate
ISCA '22	400	67	17%
ISCA '19	365	62	17%
ISCA '17	322	54	17%
ISCA '13	288	56	19%
ISCA '12	262	47	18%
ISCA '08	259	37	14%
ISCA '06	234	31	13%
ISCA '05	194	45	23%
ISCA '04	217	31	14%
ISCA '03	184	36	20%
ISCA '02	180	27	15%
ISCA '01	163	24	15%
ISCA '99	135	26	19%
Overall	3,203	543	17%

Save to Binder

Sections

Save to Binder

Index Terms

Recommendations

CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)

LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science

ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Acceptance Rates