skip to main content
10.1145/181181acmconferencesBook PagePublication PagesicsConference Proceedingsconference-collections
ICS '94: Proceedings of the 8th international conference on Supercomputing
ACM1994 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ICS94: International Conference on Supercomputing '94 Manchester England July 11 - 15, 1994
ISBN:
978-0-89791-665-3
Published:
16 July 1994
Sponsors:

Bibliometrics
Abstract

No abstract available.

Article
Free
Distributed storage control unit for the Hitachi S-3800 multivector supercomputer

This paper discusses the storage control unit of the Hitachi S-3800 supercomputer series, which is capable of achieving 8 GFLOPS in each of up to four shared-memory multiprocessors. This storage control unit is distributed to the V-SCs (vector-processor-...

Article
Free
A model for dataflow based vector execution

Although the dataflow model has been shown to allow the exploitation of parallelism at all levels, research of the past decade has revealed several fundamental problems: Synchronization at the instruction level, token matching, coloring and re-labeling ...

Article
Free
Synchronized access to streams in SIMD vector multiprocessors

The synchronized and simultaneous access to several vectors that form a single stream is typical in SIMD vector multiprocessors as well as in MIMD superscalar multiprocessors with decoupled access. In this paper we propose a block-interleaved storage ...

Article
Free
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

Current parallelizing compilers cannot identify a significant fraction of fully parallel loops because they have complex or statically insufficiently defined access patterns. For this reason, we have developed the Privatizing DOALL test—a technique for ...

Article
Free
Reducing data communication overhead for DOACROSS loop nests

If the iterations of a loop nest cannot be partitioned into independent tasks, data communication for data dependence is inevitable in order to execute them on parallel machines. This kind of loop nest is referred to as a DOACROSS loop nest.

This paper ...

Article
Free
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for data locality and parallelism, reducing or eliminating false sharing. It ...

Article
Free
An evaluation of directory protocols for medium-scale shared-memory multiprocessors

This paper considers alternative directory protocols for providing cache coherence in shared-memory multiprocessors with 32 to 128 processors, where the state requirements of DirN may be considered too large. We consider DiriB, i=1,2,4, DirN, Tristate (...

Article
Free
An evaluation of a compiler optimization for improving the performance of a coherence directory

Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in large-scale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance ...

Article
Free
Parallelisation of the SDEM distinct element stress analysis code on the KSR-1

The SDEM code models systems of interacting blocks of rock using the distinct element (DE) method, which represents these systems as discontinuums with each block acting under Newton's laws of motion. The data structures associated with the DE method ...

Article
Free
Ultrasonic wave propagation on parallel machines

“ULTSON” is a 2D code which solves the elastodynamic equations in a regular structured mesh. It has been developed at EDF to be used for non-destructive testing of nuclear power plants. Today, the code runs on classical architectures like Cray (YMP or ...

Article
Free
An efficient approach to computing fixpoints for complex program analysis

A chief source of inefficiency in program analysis using abstract interpretation comes from the fact that a large context (i.e., problem state) is propagated from node to node during the course of an analysis. This problem can be addressed and largely ...

Article
Free
Optimal local register allocation for a multiple-issue machine

This paper presents an algorithm that allocates registers optimally for straight-line code running on a generic multi-issue computer. On such a machine, an optimal register allocation is one that minimizes the number of issue slots that the code ...

Article
Free
Scheduling reductions

In order to detect more parallelism in scientific programs, one may extract a parallelism relative to reductions. This paper presents such a method which schedules programs with explicit computations of reductions. We describe the way the reductions are ...

Article
Free
A dominating set model for broadcast in all-port wormhole-routed 2D mesh networks

A new model for broadcast in wormhole-routed networks is proposed. The model uses and extends the concept of dominating sets in order to systematically develop efficient broadcast algorithms for all-port wormhole-routed systems, in which each node can ...

Article
Free
The interaction between virtual channel flow control and adaptive routing in wormhole networks

Multiprocessor interconnection networks based on low dimensional mesh or torus topologies and employing wormhole switching have become increasingly popular. Two concepts that have been proposed to improve the performance of such networks are Virtual ...

Article
Free
Fault-tolerant wormhole routing in tori

We present a method to enhance wormhole routing algorithms for deadlock-free fault-tolerant routing in tori. We consider arbitrarily-located faulty blocks and assume only local knowledge of faults. Messages are routed via shortest paths when there are ...

Article
Free
Performance of the CM-5 scalable file system

Assessing the performance and software interactions of emerging parallel input/output systems is a critical first step in input/output software tuning. Moreover, understanding the system response to well-understood, synthetic input/output patterns is ...

Article
Free
Communication in the KSR1 MPP: performance evaluation using synthetic workload experiments

We have developed an automatic technique for evaluating the communication performance of massively parallel processors (MPPs). Both communication latency and the amount of communication are investigated as a function of a few basic parameters that ...

Article
Free
Architecture implications of high-speed I/O for distributed-memory computers

We consider the problem of high-speed I/O for a single application running on multiple nodes of a distributed-memory parallel computer. Our model is that the parallel system is connected to an I/O system that provides the interface between the internal ...

Article
Free
Combining static and dynamic scheduling on distributed-memory multiprocessors

Loops are a large source of parallelism for many numerical applications. An important issue in the parallel execution of loops is how to schedule them so that the workload is well balanced among the processors. Most existing loop scheduling algorithms ...

Article
Free
An optimal upper bound on the minimal completion time in distributed supercomputing

We first consider an MIMD multiprocessor configuration with n processors. A parallel program, consisting of n processes, is executed on this system—one process per processor. The program terminates when all processes are completed. Due to ...

Article
Free
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

In this paper, an approach to the problem of exploiting parallelism within nested loops is proposed. The proposed method first finds out all the initially independent computations, and then, based on them, identifies the valid partitioning bases to ...

Article
Free
Data and program restructuring of irregular applications for cache-coherent multiprocessor

Applications with irregular data structures such as sparse matrices or finite element meshes account for a large fraction of engineering and scientific applications. Domain decomposition techniques are commonly used to partition these applications to ...

Article
Free
Nonzero structure analysis

Because the efficiency of sparse codes is very much dependent on the size and structure of input data, peculiarities of the nonzero structures of sparse matrices must be accounted for in order to avoid unsatisfying performance. Usually, this implies ...

Article
Free
Techniques to overlap computation and communication in irregular iterative applications

There are many applications in CFD and structural analysis that can be more accurately modeled using unstructured grids. Parallelization of implicit methods for unstructured grids is a difficult and important problem. This paper deals with coloring ...

Article
Free
Performance analysis of a synchronous, circuit-switched interconnection cached network

In many parallel applications, each computation entity (process, thread etc.) switches the bulk of its communication between a small group of other entities. We call this phenomenon switching locality. The Interconnection Cached Network (ICN) is a ...

Article
Free
An analysis model on nonblocking multirate broadcast networks

Designing efficient interconnection networks with powerful connecting capability remains a key issue to parallel and distributed computing systems. Many progresses have been made in nonblocking broadcast networks which can realize all one-to-many ...

Article
Free
Exploiting cache affinity in software cache coherence

Cache affinity is important to the performance of scalable shared memory multiprocessors. For multiprocessors without hardware cache coherence support, software cache coherence is the only alternative. Most existing software cache schemes ignore cache ...

Article
Free
Performance evaluation of hybrid hardware and software distributed shared memory protocols

Hardware distributed shared memory (DSM) systems efficiently support fine grain sharing of data by maintaining coherence at the level of individual cache lines and providing automatic replication in processor caches. Software DSM systems, on the other ...

Article
Free
Limited area numerical weather forecasting on a massively parallel computer

A data-parallel implementation on a SIMD platform of an operational numerical weather forecast model is presented. The performances of two popular numerical techniques within these models are discussed, namely finite difference (gridpoint) methods and ...

Contributors
  • The University of Manchester
  • University of Versailles Saint-Quentin-en-Yvelines

Recommendations

Acceptance Rates

ICS '94 Paper Acceptance Rate45of114submissions,39%Overall Acceptance Rate584of2,055submissions,28%
YearSubmittedAcceptedRate
ICS '211573925%
ICS '151604025%
ICS '141603421%
ICS '132024321%
ICS '061413726%
ICS '031713621%
ICS '021443122%
ICS '011334534%
ICS '001223327%
ICS '991805732%
ICS '971354533%
ICS '961165043%
ICS '951204941%
ICS '941144539%
Overall2,05558428%