skip to main content
10.1145/143369acmconferencesBook PagePublication PagesicsConference Proceedingsconference-collections
ICS '92: Proceedings of the 6th international conference on Supercomputing
ACM1992 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ICS92: ACM SIGARCH International Conference on Supercomputing Washington D. C. USA July 19 - 24, 1992
ISBN:
978-0-89791-485-7
Published:
01 August 1992
Sponsors:

Bibliometrics
Abstract

No abstract available.

Article
Free
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines

The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve ...

Article
Free
Evaluation of compiler generated parallel programs on three multicomputers

Distributed memory parallel processors (DMPPs) have no hardware support for a global address space. However, conventional programs written in a sequential imperative language such as Fortran typically manipulate few, large arrays. The Oxygen compiler, ...

Article
Free
Automatic data mapping for distributed-memory parallel computers

The performance of a program on a distributed-memory parallel computer depends on the algorithms employed, the structure and speed of the machine's communication network, and the ways in which data are distributed to the processors. This paper addresses ...

Article
Free
Characterizing memory performance in vector multiprocessors

We propose a set of three memory performance measures directed at vector multiprocessors. One is the port reservation time which is closely related to the commonly-used memory bandwidth measure. The second is the vector fill time and is the latency ...

Article
Free
Performance analysis of the CM-2, a massively parallel SIMD computer

The performance evaluation process for a massively parallel distributed memory SIMD computer is described generally. The performance in basic computation, grid communication, and computation with grid communication is analyzed. A practical performance ...

Article
Free
Evaluation of the lock mechanism in a snooping cache

This paper discusses the design concepts of a lock mechanism for a Parallel Inference Machine (the PIM/c prototype) and investigates the performance of the mechanism in detail.

Lock operations are extremely frequent on the PIM; however, lock contention ...

Article
Free
Processor allocation and loop scheduling on multiprocessor computers

This paper is concerned with the automatic exploitation of the parallelism detected in a sequential program. The target machine is a shared memory multiprocessor.

The main goal is minimizing the completion time of the program. To achieve this, one has ...

Article
Free
Low level scheduling using the hierarchical task graph

This paper introduces a new efficient instruction scheduling algorithm that can schedule across basic blocks. Scheduling globally, across basic blocks, is done by using an extension of the control flow graph (CFG) that combines both data and control ...

Article
Free
Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time

We present a computationally efficient method for deriving the most appropriate transformation and mapping of a nested loop for a given hierarchical parallel machine. This method is in the context of our systematic and general theory of unimodular loop ...

Article
Free
ABCL/onEM-4: a new software/hardware architecture for object-oriented concurrent computing on an extended dataflow supercomputer

The trend towards object-oriented software construction is becoming more and more prevalent, and parallel programming cannot be an exception. In the context of parallel computation, it is often natural to model the computation as message passing between ...

Article
Free
Tolerating data access latency with register preloading

By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the ...

Article
Free
Supercomputing and transputers

It will be studied which degree parallel supercomputers can be scaled to. Necessary measures to achieve a maximum scalability will be discussed, and a case-study be presented. To this purpose, a new class of “supermassively parallel architectures” is ...

Article
Free
Automatic software cache coherence through vectorization

Access latency in large-scale shared-memory multiprocessors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or more levels of cache is an accepted way to reduce the ...

Article
Free
Life span strategy—a compiler-based approach to cache coherence

In this paper, a cache coherence strategy with a combined software and hardware approach is proposed for large-scale multiprocessor systems. The new strategy has the scalability advantages of existing software strategies and does not rely on shared ...

Article
Free
Conflict-free access of vectors with power-of-two strides

An address mapping and an access order is presented for conflict-free access to vectors with any initial address and power-of-two strides. We show that for this conflict-free access it is necessary that the memory be unmatched and present an ...

Article
Free
Parallel program visualization using SIEVE.1

In this paper we introduce a new model for the design of performance analysis and visualization tools. The system integrates static code analysis, relational database designs and a spreadsheet model of interactive programming. This system provides a ...

Article
Free
The CODE 2.0 graphical parallel programming language

CODE 2.0 is a graphical parallel programming system that targets the three goals of ease of use, portability, and production of efficient parallel code. Ease of use is provided by an integrated graphical/textual interface, a powerful dynamic model of ...

Article
Free
Paralex: an environment for parallel programming in distributed systems

Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose super computers. The technical issues that need to be addressed in exploiting the parallelism ...

Article
Free
Exploiting heterogeneous parallelism on a multithreaded multiprocessor

This paper describes an integrated architecture, compiler, runtime, and operating system solution to exploiting heterogeneous parallelism. The architecture is a pipelined multi-threaded multiprocessor, enabling the execution of very fine (multiple ...

Article
Free
An architectural framework for migration from CISC to higher performance platforms

We describe a novel architectural framework that allows software applications written for a given Complex Instruction Set Computer (CISC) to migrate to a different, higher performance architecture, without a significant investment on the part of the ...

Article
Free
Manchester data-flow: a progress report

The Manchester Data-Flow Machine, MDFM, has evolved continuously during the past decade. By the time the prototype uniprocessor hardware system was decommissioned, in 1989, the putative multi-processor architecture comprised separate Processing Elements ...

Article
Free
Array abstractions using semantic analysis of trapezoid congruences

With the growing use of vector supercomputers, efficient and accurate data structure analyses are needed. What we propose in this paper is to use the quite general framework of Cousot's abstract interpretation for the particular analysis of multi-...

Article
Free
A comprehensive approach to parallel data flow analysis

We present a comprehensive approach to performing data flow analysis in parallel. We first identify three types of parallelism inherent in the data flow solution process: independent-problem parallelism, separate-unit parallelism and algorithmic ...

Article
Free
Compile-time analysis of communicating processes

We present an algorithm for analyzing deadlock and for constructing sequentializations of a class of communicating sequential processes. The algorithm may be used for deadlock detection in parallel and distributed programs at compile time, or for ...

Article
Free
Register requirements of pipelined processors

To enable concurrent instruction execution, scientific computers generally rely on pipelining, which combines with faster system clocks to achieve greater throughput. Each concurrently executing instruction requires buffer space, usually implemented as ...

Article
Free
Benchmarking a vector-processor prototype based on multithreaded streaming/FIFO vector (MSFV) architecture

This paper presents the benchmark results on a vector-processor prototype based on the MSFV (multithreaded streaming/FIFO vector) architecture. The MSFV architecture is single-chip oriented, and thus its main object is to save the off-chip memory ...

Article
Free
On storage schemes for parallel array access

In parallel matrix manipulation operations, some data patterns need to be accessed in one memory cycle without conflict. Investigating the frequently used data patterns, we propose a powerful skewing scheme which allows most frequently used data ...

Article
Free
A general algorithm for data dependence analysis

With the development of ever more sophisticated data flow analysis algorithms, traditional data dependence tests based on elementary loop information will not be sufficient in the future. In this paper, quite general algorithms are presented for solving ...

Article
Free
On exact data dependence analysis

The GCD test and the Banerjee-Wolfe test are the two tests traditionally used to determine statement data dependence, subject to direction vectors, in automatic vectorization / parallelization of loops. In an earlier study [14] a sufficient condition ...

Article
Free
Array privatization for parallel execution of loops

In recent experiments, array privatization played a critical role in successful parallelization of several real programs. This paper presents compiler algorithms for the program analysis for this transformation. The paper also addresses issues in the ...

Contributors
  • Rice University
  • University of Illinois Urbana-Champaign

Recommendations

Acceptance Rates

Overall Acceptance Rate584of2,055submissions,28%
YearSubmittedAcceptedRate
ICS '211573925%
ICS '151604025%
ICS '141603421%
ICS '132024321%
ICS '061413726%
ICS '031713621%
ICS '021443122%
ICS '011334534%
ICS '001223327%
ICS '991805732%
ICS '971354533%
ICS '961165043%
ICS '951204941%
ICS '941144539%
Overall2,05558428%