# Lecture Notes in Computer Science

4339

Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

#### **Editorial Board**

David Hutchison

Lancaster University, UK

Takeo Kanade

Carnegie Mellon University, Pittsburgh, PA, USA

Josef Kittler

University of Surrey, Guildford, UK

Jon M. Kleinberg

Cornell University, Ithaca, NY, USA

Friedemann Mattern

ETH Zurich, Switzerland

John C. Mitchell

Stanford University, CA, USA

Moni Naor

Weizmann Institute of Science, Rehovot, Israel

Oscar Nierstrasz

University of Bern, Switzerland

C. Pandu Rangan

Indian Institute of Technology, Madras, India

Bernhard Steffen

University of Dortmund, Germany

Madhu Sudan

Massachusetts Institute of Technology, MA, USA

Demetri Terzopoulos

University of California, Los Angeles, CA, USA

Doug Tygar

University of California, Berkeley, CA, USA

Moshe Y. Vardi

Rice University, Houston, TX, USA

Gerhard Weikum

Max-Planck Institute of Computer Science, Saarbruecken, Germany

Eduard Ayguadé Gerald Baumgartner J. Ramanujam P. Sadayappan (Eds.)

# Languages and Compilers for Parallel Computing

18th International Workshop, LCPC 2005 Hawthorne, NY, USA, October 20-22, 2005 Revised Selected Papers



#### Volume Editors

Eduard Ayguadé Computer Architecture Department Universitat Politécnica de Catalunya 08034 Barcelona, Catalunya, Spain E-mail: eduard@cepba.upc.es

Gerald Baumgartner
Department of Computer Science
Louisiana State University
Baton Rouge, LA 70803, USA
E-mail: gb@csc.lsu.edu

#### J. Ramanujam

Department of Electrical and Computer Engineering Louisiana State University Baton Rouge, LA 70803, USA E-mail: jxr@ece.lsu.edu

P. Sadayappan
Department of Computer Science and Engineering
The Ohio State University
Columbus, OH 43210, USA
E-mail: saday@cis.ohio-state.edu

Library of Congress Control Number: 2006939009

CR Subject Classification (1998): D.3, D.1.3, F.1.2, B.2.1, C.2.4, C.2, E.1, D.4

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

ISSN 0302-9743

ISBN-10 3-540-69329-7 Springer Berlin Heidelberg New York ISBN-13 978-3-540-69329-1 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media

springer.com

© Springer-Verlag Berlin Heidelberg 2006 Printed in Germany

Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11967729 06/3142 5 4 3 2 1 0

## **Preface**

The 18th International Workshop on Languages and Compilers for High-Performance Computing was scheduled to be held in New Orleans, Louisiana, in October 2005. Unfortunately, because of the devastation caused by Hurricane Katrina the meeting needed to be moved. It was held in Hawthorne, New York, thanks to help from IBM. The workshop is an annual forum for leading research groups to present their current research activities and the latest results, covering languages, compiler techniques, runtime environments, and compiler-related performance evaluation for parallel and high-performance computing. Sixty-five researchers from Canada, France, Japan, Korea, P.R. China, Spain, Switzerland, Taiwan, UK, and the USA attended the workshop.

Thirty-four research papers (26 regular papers and eight short papers) were presented at the workshop. These papers were reviewed by the Program Committee; external reviewers were used as needed. The authors then received additional comments during the workshop. The revisions after the workshop are now assembled into these final proceedings.

We thank Siddhartha Chatterjee from the IBM T.J. Watson Research Center for his keynote talk titled "The Changing Landscape of Parallel Computing." The workshop included a special session titled "High-Productivity Languages for HPC: Compiler Challenges" consisting of invited talks on the three languages being developed by the DARPA High-Productivity Computing Systems (HPCS) vendors. The talks were given by Steve Dietz (from Cray on the language Chapel), Vivek Sarkar (from IBM on the language X10), and David Chase (from Sun on the language Fortress). Frederica Darema gave a presentation during the workshop banquet about the proposed Dynamic Data-Driven Applications Systems (DDDAS) program at the US National Science Foundation.

The workshop was sponsored by the US National Science Foundation and by International Business Machines Corporation. Their generous contribution is greatly appreciated. We appreciate the assistance offered by the staff in the Department of Computer Science and Engineering at the Ohio State University and thank Alex Ramirez of Universitat Politécnica de Catalunya (Spain) for generous help with the paper submission and review software. Our special thanks go to the LCPC 2005 Program Committee and the external reviewers for their efforts in reviewing the submissions. Advice and suggestions from both the Steering Committee and the Program Committee are much appreciated. Finally, we wish to thank all the authors and participants for their contributions and lively discussions, which made the workshop a success.

Eduard Ayguadé, Gerald Baumgartner, J. (Ram) Ramanujam, P. (Saday) Sadayappan

## Organization

## Committees

General/Program Co-chairs: Eduard Ayguadé

(Universitat Politècnica de Catalunya, Spain)

Gerald Baumgartner

(Louisiana State University, USA)

J. (Ram) Ramanujam

(Louisiana State University, USA)

P. (Saday) Sadayappan

(The Ohio State University, USA)

Program Committee: Nancy Amato

(Texas A&M University, USA)

Gheorghe Almási

(IBM Thomas J. Watson Research Center, USA)

Eduard Ayguadé

(Universitat Politècnica de Catalunya, Spain)

Gerald Baumgartner

(Louisiana State University, USA)

Calin Cascaval

(IBM Thomas J. Watson Research Center, USA)

Rudolf Eigenmann

(Purdue University, USA)

Zhiyuan Li

(Purdue University, USA)

Sam Midkiff

(Purdue University, USA)

J. (Ram) Ramanujam

(Louisiana State University, USA)

Lawrence Rauchwerger

(Texas A&M University, USA)

P. (Saday) Sadayappan

(The Ohio State University, USA)

Bjarne Stoustrup

(Texas A&M University, USA)

Peng Wu

(IBM Thomas J. Watson Research Center, USA)

### VIII Organization

Local Organizing Committee: Gheorghe Almási

(IBM Thomas J. Watson Research Center,

USA)

Calin Cascaval

(IBM Thomas J. Watson Research Center,

USA) Peng Wu

(IBM Thomas J. Watson Research Center,

USA)

Steering Committee: Utpal Banerjee

(Intel Corporation, USA)

David Gelernter

(Yale University, USA)

Alex Nicolau

(University of California, Irvine, USA)

David Padua

(University of Illinois at Urbana-Champaign,

USA)

## **Sponsors**

National Science Foundation, USA International Business Machines Corporation

## **Table of Contents**

| Revisiting Graph Coloring Register Allocation: A Study of the Chaitin-Briggs and Callahan-Koblenz Algorithms | 1   |
|--------------------------------------------------------------------------------------------------------------|-----|
| Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design       | 17  |
| Manipulating MAXLIVE for Spill-Free Register Allocation                                                      | 32  |
| Optimizing Packet Accesses for a Domain Specific Language on<br>Network Processors                           | 47  |
| Array Replication to Increase Parallelism in Applications Mapped to Configurable Architectures               | 62  |
| Generation of Control and Data Flow Graphs from Scheduled and Pipelined Assembly Code                        | 76  |
| Applying Data Copy to Improve Memory Performance of General Array Computations                               | 91  |
| A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion                                    | 106 |
| Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li and María Jesús Garzarán      | 121 |
| A Language for the Compact Representation of Multiple Program  Versions                                      | 136 |
| Efficient Computation of May-Happen-in-Parallel Information for Concurrent Java Programs                     | 152 |

| Evaluating the Impact of Thread Escape Analysis on a Memory<br>Consistency Model-Aware Compiler                 | 170  |
|-----------------------------------------------------------------------------------------------------------------|------|
| Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee,<br>Samuel P. Midkiff, Jaejin Lee, and David Padua          |      |
| Concurrency Analysis for Parallel Programs with Textually Aligned                                               |      |
| Barriers                                                                                                        | 185  |
| Titanium Performance and Potential: An NPB Experimental Study Kaushik Datta, Dan Bonachea, and Katherine Yelick | 200  |
| Efficient Search-Space Pruning for Integrated Fusion and Tiling                                                 | 015  |
| Transformations                                                                                                 | 215  |
| Automatic Measurement of Instruction Cache Capacity                                                             | 230  |
| Combined ILP and Register Tiling: Analytical Model and Optimization                                             |      |
| Framework                                                                                                       | 244  |
| Analytic Models and Empirical Search: A Hybrid Approach to Code                                                 |      |
| Optimization                                                                                                    | 259  |
| Testing Speculative Work in a Lazy/Eager Parallel Functional                                                    |      |
| Language                                                                                                        | 274  |
| Loop Selection for Thread-Level Speculation                                                                     | 289  |
| Software Thread Level Speculation for the Java Language and Virtual                                             | 90.4 |
| Machine Environment                                                                                             | 304  |
| Lightweight Monitoring of the Progress of Remotely Executing                                                    | 010  |
| Computations                                                                                                    | 319  |

| Compilation                                                                                                                                                           |         |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| Florian Schneider and Thomas R. Gross                                                                                                                                 |         |
| A Domain-Specific Interpreter for Parallelizing a Large Mixed-Language Visualisation Application                                                                      |         |
| Compiler Control Power Saving Scheme for Multi Core Processors<br>Jun Shirako, Naoto Oshiyama, Yasutaka Wada, Hiroaki Shikano,<br>Keiji Kimura, and Hironori Kasahara |         |
| Code Transformations for One-Pass Analysis                                                                                                                            |         |
| Scalable Array SSA and Array Data Flow Analysis                                                                                                                       |         |
| Interprocedural Symbolic Range Propagation for Optimizing Compilers                                                                                                   |         |
| Parallelization of Utility Programs Based on Behavior Phase Analysis                                                                                                  |         |
| A Systematic Approach to Model-Guided Empirical Search for Memoriterarchy Optimization                                                                                |         |
| An Efficient Approach for Self-scheduling Parallel Loops on Multiprogrammed Parallel Computers                                                                        |         |
| Dynamic Compilation for Reducing Energy Consumption of I/O-Intensive Applications                                                                                     |         |
| Supporting SELL for High-Performance Computing                                                                                                                        |         |
| Compiler Supports and Optimizations for PAC VLIW DSP Processors                                                                                                       | <br>Lee |