SIGARCH: Vol 16, No 2

Volume 16, Issue 2May 1988Special Issue: Proceedings of the 15th annual international symposium on Computer Architecture

Volume 16, Issue 2

May 1988

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5964

Tags:

Bibliometrics

Newsletter Downloads

PDFFront Matter Material

PDFBack Matter Material

Select All

Export Citations Save to Binder

article

Free

Critical issues in mapping neural networks on message-passing multicomputers

Pages 3–11https://doi.org/10.1145/633625.52401

Connectionist models such as artificial neural systems, offer an intrinsically concurrent computational paradigm. We investigate the architectural requirements for efficiently simulating large neural networks on a multicomputer system with thousands of ...

article

Free

Multinomial conjunctoid statistical learning machines

Pages 12–17https://doi.org/10.1145/633625.52402

Multinomial Conjunctoids are supervised statistical modules that learn the relationships among binary events. The multinomial conjunctoid algorithm precludes the following problems that occur in existing feedforward multi-layered neural networks: (a) ...

article

Free

A bit-plane architecture for optical computing with two-dimensional symbolic substitution

Pages 18–27https://doi.org/10.1145/633625.52403

A novel architecture based on optical technology is presented for constructing parallel computers. The architecture exploits optics for its ultra-high speed, massive parallelism, and dense connectivity. The processing is based on a new technique called ...

article

Free

The reconfigurable arithmetic processor

Pages 30–36https://doi.org/10.1145/633625.52404

The Reconfigurable Arithmetic Processor (RAP) is an arithmetic processing node for a message-passing, MIMD concurrent computer. It incorporates on one chip several serial, 64 bit floating point arithmetic units connected by a switching network. By ...

article

Free

The performance potential of multiple functional unit processors

Pages 37–44https://doi.org/10.1145/633625.52405

In this paper, we look at the interaction of pipelining and multiple functional units in single processor machines. When implementing a high performance machine, a number of hardware techniques maybe used to improve the performance of the final system. ...

article

Free

Exploiting parallel microprocessor microarchitectures with a compiler code generator

Pages 45–53https://doi.org/10.1145/633625.52406

With advances in VLSI technology, microprocessor designers can provide more microarchitectural parallelism to increase performance. We have identified four major forms of such parallelism: multiple microoperations issued per cycle, multiple result ...

article

Free

Analysis of memory referencing behavior for design of local memories

Pages 56–63https://doi.org/10.1145/633625.52407

Memory referencing behavior is analyzed via the study of traces for the purpose of developing new local memory structures and management techniques. A novel trace processing technique called flattening reduces the dependence of the results on the ...

article

Free

Performance evaluation of on-chip register and cache organizations

Pages 64–72https://doi.org/10.1145/633625.52408

Chip area is a critical resource in the design of VLSI processors. There are many different alternative designs that could fill this chip area. This paper compares several different local memory organizations applicable for single-chip processors. ...

article

Free

On the inclusion properties for multi-level cache hierarchies

Pages 73–80https://doi.org/10.1145/633625.52409

The inclusion property is essential in reducing the cache coherence complexity for multiprocessors with multilevel cache hierarchies. We give some necessary and sufficient conditions for imposing the inclusion property for fully- and set-associative ...

article

Free

A simulation study of two-level caches

Pages 81–88https://doi.org/10.1145/633625.52410

We report on a trace-driven simulation study to examine the effect of a two-level cache hierarchy in uniprocessors. A simulation model of a multiple-cycle-per-instruction processor was constructed to estimate the total cycles required to execute a ...

article

Free

Hyperswitch network for the hypercube computer

Pages 90–99https://doi.org/10.1145/633625.52411

The performance of a parallel algorithm depends in a large part on the interconnection topology of the multicomputer system. The method presented in this paper realizes a kind of interconnection network, called a hyperswitch network, that is achieved ...

article

Free

Analysis of bus hierarchies for multiprocessors

Pages 100–107https://doi.org/10.1145/633625.52412

In order to build large shared-memory multiprocessor systems that take advantage of current hardware-enforced cache coherence protocols, an interconnection network is needed that acts logically as a single bus while avoiding the electrical loading ...

article

Free

Extra group network: a cost-effective fault-tolerant multistage interconnection network

Pages 108–115https://doi.org/10.1145/633625.52413

This paper introduces a new class of fault-tolerant multistage interconnection networks, dubbed as Extra Group Networks (EGNs). An EGN-m of size N is designed to have m + 1 unique path multistage networks of size N/m. This approach of constructing the ...

article

Free

A partial-multiple-bus computer structure with improved cost effectiveness

Pages 116–122https://doi.org/10.1145/633625.52414

This paper addresses the design and performance analysis of partial-multiple-bus interconnection networks. One such structure, called processor-oriented partial-multiple-bus (or PPMB), is proposed. It serves as an alternative to the conventional ...

article

Free

Flagship: a parallel architecture for declarative programming

Pages 124–130https://doi.org/10.1145/633625.52415

The Flagship project aims to produce a computing technology based on the declarative style of programming. A major component of that technology is the design for a parallel machine which can efficiently exploit the implicit parallelism in declarative ...

article

Free

Toward a dataflow/von Neumann hybrid architecture

R. A. Iannucci

Pages 131–140https://doi.org/10.1145/633625.52416

Dataflow architectures offer the ability to trade program level parallelism in order to overcome machine level latency. Dataflow further offers a uniform synchronization paradigm, representing one end of a spectrum wherein the unit of scheduling is a ...

article

Free

Resource requirements of dataflow programs

Pages 141–150https://doi.org/10.1145/633625.52417

Parallel execution of programs requires more resources and more complex resource management than sequential execution. If concurrent tasks can be spawned dynamically, programs may require an inordinate amount of resources when the potential parallelism ...

article

Free

Priority-driven, preemptive I/O controllers for real-time systems

Pages 152–159https://doi.org/10.1145/633625.52418

Current I/O controller architectures inhibit the use of priority-driven preemptive scheduling algorithms that can guarantee hard deadlines in real-time systems. This paper examines the effect of three I/O controller architectures upon schedulable ...

article

Free

A kernel-independent, pipelined architecture for real-time 2-D convolution

Pages 160–166https://doi.org/10.1145/633625.52419

Existing architectures for 2-D convolution suffer from such drawbacks as inflexibility with respect to image and/or kernel sizes (systolic arrays) or data distribution and collection overhead (SIMD processor arrays). This paper introduces a pipelined ...

article

Free

Exploiting bit level concurrency in real-time geometric feature extractions

Pages 167–174https://doi.org/10.1145/633625.52420

Geometric feature extraction can be characterized as a computationally intensive task in the environment of real-time automated vision systems requiring algorithms with a high degree of parallelism and pipelining under the raster-scan I/O constraint. ...

article

Free

Measuring VAX 8800 performance with a histogram hardware monitor

Pages 176–185https://doi.org/10.1145/633625.52421

This paper reports the results of a study of VAX 8800 processor performance using a hardware monitor that collects histograms of the processor's micro-PC and memory bus status. The monitor keeps a count of all machine cycles executed at each micro-PC ...

article

Free

Multiprocessor cache analysis using ATUM

Pages 186–195https://doi.org/10.1145/633625.52422

The design of high-performance multiprocessor systems necessitates a careful analysis of the memory system performance of parallel programs. Lacking multiprocessor address traces, previous multiprocessor performance studies using analytical models had ...

article

Free

Trade-offs between devices and paths in achieving disk interleaving

Pages 196–201https://doi.org/10.1145/633625.52423

There is a continuing need to improve the performance of disk subsystems, and one of the key factors of a disk subsystem's performance is the data transfer rate. While it is clear that increasing the data transfer rate would reduce the service time for ...

article

Free

Design of a concurrent computer for solving systems of linear equations

Pages 204–211https://doi.org/10.1145/633625.52424

In this paper we describe the design of a systolic array of Householder processor elements, which is dedicated to the solution of large (dense) systems of linear equations. The array is capable of executing two different algorithms. One for the solution ...

article

Free

The white dwarf: a high-performance application-specific processor

Pages 212–222https://doi.org/10.1145/633625.52425

This paper presents the design and implementation of a high-performance special-purpose processor, called The White Dwarf, for accelerating finite element analysis algorithms. The White Dwarf CPU contains two Am29325 32-bit floating-point processors and ...

article

Free

Solving partial differential equations in a data-driven multiprocessor environment

Pages 223–230https://doi.org/10.1145/633625.52426

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as ...

article

Free

Scrambled storage for parallel memory systems

D. Lee

Pages 232–239https://doi.org/10.1145/633625.52427

A scrambled storage scheme is proposed for storing arrays of NXN elements in N = 2ⁿ parallel memory modules to allow conflict-free access to various array partitions. It is shown that the scheme allows conflict-free access to rows, columns, square ...

article

Free

The architecture of a Linda coprocessor

Pages 240–249https://doi.org/10.1145/633625.52428

We describe the architecture of a coprocessor that supports the communication primitives of the Linda parallel programming environment in hardware. The coprocessor is a critical element in the architecture of the Linda Machine, an MIMD parallel ...

article

Free

Deadlock avoidance for systolic communication

H. T. Kung

Pages 252–260https://doi.org/10.1145/633625.52429

Under the systolic communication model, each cell (or processor) in a parallel processing system can operate directly on data residing at the cell's input queues and move computed results directly to the cell's output queues. Incoming and outgoing ...

article

Free

Cache performance of vector processors

Pages 261–268https://doi.org/10.1145/633625.52430

An instruction-level simulator for IBM 3090 with VF (vector facility) has been developed for studying the performance of vector processors and their memory hierarchies. Initial use of the simulator is to understand the program locality of real ...

Sections

Newsletter Downloads

Save to Binder

Comments

Subjects