skip to main content
Volume 16, Issue 2May 1988Special Issue: Proceedings of the 15th annual international symposium on Computer Architecture
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
ISSN:0163-5964
Reflects downloads up to 01 Mar 2025Bibliometrics
article
Free
Critical issues in mapping neural networks on message-passing multicomputers

Connectionist models such as artificial neural systems, offer an intrinsically concurrent computational paradigm. We investigate the architectural requirements for efficiently simulating large neural networks on a multicomputer system with thousands of ...

article
Free
Multinomial conjunctoid statistical learning machines

Multinomial Conjunctoids are supervised statistical modules that learn the relationships among binary events. The multinomial conjunctoid algorithm precludes the following problems that occur in existing feedforward multi-layered neural networks: (a) ...

article
Free
A bit-plane architecture for optical computing with two-dimensional symbolic substitution

A novel architecture based on optical technology is presented for constructing parallel computers. The architecture exploits optics for its ultra-high speed, massive parallelism, and dense connectivity. The processing is based on a new technique called ...

article
Free
The reconfigurable arithmetic processor

The Reconfigurable Arithmetic Processor (RAP) is an arithmetic processing node for a message-passing, MIMD concurrent computer. It incorporates on one chip several serial, 64 bit floating point arithmetic units connected by a switching network. By ...

article
Free
The performance potential of multiple functional unit processors

In this paper, we look at the interaction of pipelining and multiple functional units in single processor machines. When implementing a high performance machine, a number of hardware techniques maybe used to improve the performance of the final system. ...

article
Free
Exploiting parallel microprocessor microarchitectures with a compiler code generator

With advances in VLSI technology, microprocessor designers can provide more microarchitectural parallelism to increase performance. We have identified four major forms of such parallelism: multiple microoperations issued per cycle, multiple result ...

article
Free
Analysis of memory referencing behavior for design of local memories

Memory referencing behavior is analyzed via the study of traces for the purpose of developing new local memory structures and management techniques. A novel trace processing technique called flattening reduces the dependence of the results on the ...

article
Free
Performance evaluation of on-chip register and cache organizations

Chip area is a critical resource in the design of VLSI processors. There are many different alternative designs that could fill this chip area. This paper compares several different local memory organizations applicable for single-chip processors. ...

article
Free
On the inclusion properties for multi-level cache hierarchies

The inclusion property is essential in reducing the cache coherence complexity for multiprocessors with multilevel cache hierarchies. We give some necessary and sufficient conditions for imposing the inclusion property for fully- and set-associative ...

article
Free
A simulation study of two-level caches

We report on a trace-driven simulation study to examine the effect of a two-level cache hierarchy in uniprocessors. A simulation model of a multiple-cycle-per-instruction processor was constructed to estimate the total cycles required to execute a ...

article
Free
Hyperswitch network for the hypercube computer

The performance of a parallel algorithm depends in a large part on the interconnection topology of the multicomputer system. The method presented in this paper realizes a kind of interconnection network, called a hyperswitch network, that is achieved ...

article
Free
Analysis of bus hierarchies for multiprocessors

In order to build large shared-memory multiprocessor systems that take advantage of current hardware-enforced cache coherence protocols, an interconnection network is needed that acts logically as a single bus while avoiding the electrical loading ...

article
Free
Extra group network: a cost-effective fault-tolerant multistage interconnection network

This paper introduces a new class of fault-tolerant multistage interconnection networks, dubbed as Extra Group Networks (EGNs). An EGN-m of size N is designed to have m + 1 unique path multistage networks of size N/m. This approach of constructing the ...

article
Free
A partial-multiple-bus computer structure with improved cost effectiveness

This paper addresses the design and performance analysis of partial-multiple-bus interconnection networks. One such structure, called processor-oriented partial-multiple-bus (or PPMB), is proposed. It serves as an alternative to the conventional ...

article
Free
Flagship: a parallel architecture for declarative programming

The Flagship project aims to produce a computing technology based on the declarative style of programming. A major component of that technology is the design for a parallel machine which can efficiently exploit the implicit parallelism in declarative ...

article
Free
Toward a dataflow/von Neumann hybrid architecture

Dataflow architectures offer the ability to trade program level parallelism in order to overcome machine level latency. Dataflow further offers a uniform synchronization paradigm, representing one end of a spectrum wherein the unit of scheduling is a ...

article
Free
Resource requirements of dataflow programs

Parallel execution of programs requires more resources and more complex resource management than sequential execution. If concurrent tasks can be spawned dynamically, programs may require an inordinate amount of resources when the potential parallelism ...

article
Free
Priority-driven, preemptive I/O controllers for real-time systems

Current I/O controller architectures inhibit the use of priority-driven preemptive scheduling algorithms that can guarantee hard deadlines in real-time systems. This paper examines the effect of three I/O controller architectures upon schedulable ...

article
Free
A kernel-independent, pipelined architecture for real-time 2-D convolution

Existing architectures for 2-D convolution suffer from such drawbacks as inflexibility with respect to image and/or kernel sizes (systolic arrays) or data distribution and collection overhead (SIMD processor arrays). This paper introduces a pipelined ...

article
Free
Exploiting bit level concurrency in real-time geometric feature extractions

Geometric feature extraction can be characterized as a computationally intensive task in the environment of real-time automated vision systems requiring algorithms with a high degree of parallelism and pipelining under the raster-scan I/O constraint. ...

article
Free
Measuring VAX 8800 performance with a histogram hardware monitor

This paper reports the results of a study of VAX 8800 processor performance using a hardware monitor that collects histograms of the processor's micro-PC and memory bus status. The monitor keeps a count of all machine cycles executed at each micro-PC ...

article
Free
Multiprocessor cache analysis using ATUM

The design of high-performance multiprocessor systems necessitates a careful analysis of the memory system performance of parallel programs. Lacking multiprocessor address traces, previous multiprocessor performance studies using analytical models had ...

article
Free
Trade-offs between devices and paths in achieving disk interleaving

There is a continuing need to improve the performance of disk subsystems, and one of the key factors of a disk subsystem's performance is the data transfer rate. While it is clear that increasing the data transfer rate would reduce the service time for ...

article
Free
Design of a concurrent computer for solving systems of linear equations

In this paper we describe the design of a systolic array of Householder processor elements, which is dedicated to the solution of large (dense) systems of linear equations. The array is capable of executing two different algorithms. One for the solution ...

article
Free
The white dwarf: a high-performance application-specific processor

This paper presents the design and implementation of a high-performance special-purpose processor, called The White Dwarf, for accelerating finite element analysis algorithms. The White Dwarf CPU contains two Am29325 32-bit floating-point processors and ...

article
Free
Solving partial differential equations in a data-driven multiprocessor environment

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as ...

article
Free
Scrambled storage for parallel memory systems

A scrambled storage scheme is proposed for storing arrays of NXN elements in N = 2n parallel memory modules to allow conflict-free access to various array partitions. It is shown that the scheme allows conflict-free access to rows, columns, square ...

article
Free
The architecture of a Linda coprocessor

We describe the architecture of a coprocessor that supports the communication primitives of the Linda parallel programming environment in hardware. The coprocessor is a critical element in the architecture of the Linda Machine, an MIMD parallel ...

article
Free
Deadlock avoidance for systolic communication

Under the systolic communication model, each cell (or processor) in a parallel processing system can operate directly on data residing at the cell's input queues and move computed results directly to the cell's output queues. Incoming and outgoing ...

article
Free
Cache performance of vector processors

An instruction-level simulator for IBM 3090 with VF (vector facility) has been developed for studying the performance of vector processors and their memory hierarchies. Initial use of the simulator is to understand the program locality of real ...

Comments

Subjects