The Workshop on Languages, Compilers, and Run-time Support for Scalable Systems (LCR) is a bi-annual gathering of computer scientists who develop enabling software for large scale parallel and distributed applications, held in the off-year for PPoPP. Attendance for this workshop is limited to 75 participants. The LCR community is interested in a broad range of technologies, with a common goal of developing systems that enable real applications.Scalability is important for scientific as well as commercial applications, and has to be addressed across computation models spanning data parallel computing, grid computing, peer-to-peer computing, content distribution networks, and multi-threaded servers. For this seventh meeting, LCR 04, we are especially interested in cross-fertilization of ideas in supporting and enhancing scalability across this spectrum of applications and computation models. Topics of interest to the workshop include the following:•Emerging applications•Language features and compilation•Communication systems and libraries•Resource management•Integration of compiler and runtime systems•Scalable I/O•Performance evaluation•Debuggers•Fault tolerance and reliability•Operating Systems
Overcoming barriers to restructuring in a modular visualisation environment
This paper explores the potential for automatic cross-component optimisation in the Python / VTK-based MayaVi modular visualisation environment. The idea is to delay execution of the VTK components called from the MayaVi tool, which requires no ...
Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors
This paper presents runtime mechanisms that enable flexible use of speculative precomputation in conjunction with thread-level parallelism on SMT processors. The mechanisms were implemented and evaluated on a real multi-SMT system. So far, speculative ...
Efficient data driven run-time code generation
Knowledge of data values at run-time allows us to generate better code in terms of efficiency, size and power consumption.This paper introduces a low-level compiling technique based on a minimal code generator with parametric embedded sections to ...
General parallel computations on desktop grid and P2P systems
This paper defines the requirements for effective execution of iterative computations requiring communication on a desktop grid. It then proposes a combination of a p2p communication model, an algorithmic approach (asynchronous iterations) and a ...
Memory access analysis and optimization approaches on splay trees
Splay trees, a type of self-adjusting search tree, are introduced and analyzed. Since they have been widely used in search problems, any performance improvements will yield great benefits. First, the paper introduces some background about splay trees ...
Addressing the trust asymmetry problem in grid computing with encrypted computation
Trust asymmetry is a core, albeit rarely discussed, problem in scalable computing. Techniques for protecting a host's operating system (and other processes) from a user's process are well understood and widely deployed. However, there is currently no ...
The Hierarchically Tiled Arrays programming approach
- Basilio B. Fraguela,
- Jia Guo,
- Ganesh Bikshandi,
- María J. Garzarán,
- Gheorghe Almási,
- José Moreira,
- David Padua
In this paper, we show our initial experience with a class of objects, called Hierarchically Tiled Arrays (HTAs), that encapsulate parallelism. HTAs allow the construction of single-threaded parallel programs where a master process distributes tasks to ...
An orchestration language for parallel objects
Charm++, a parallel object language based on the idea of virtual processors, has attained significant success in efficient parallelization of applications. Requiring the user to only decompose the computation into a large number of objects ("virtual ...
Comparing Ethernet and Myrinet for MPI communication
This paper compares the performance of Myrinet and Ethernet as a communication substrate for MPI libraries. MPI library implementations for Myrinet utilize user-level communication protocols to provide low latency and high bandwidth MPI messaging. In ...
Design tradeoffs in modern software transactional memory systems
Software Transactional Memory (STM) is a generic non-blocking synchronization construct that enables automatic conversion of correct sequential objects into correct concurrent objects. Because it is nonblocking, STM avoids traditional performance and ...
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems
Scientific applications contain program sections that exhibit repetitive data accesses. This paper proposes combined compile-time/runtime data reference analysis techniques that exploit repetitive data access behavior in both regular and irregular ...
A programming language for ad-hoc networks of mobile devices
Networks of mobile devices and embedded systems represent a new computing platform. Typical network nodes range from sensors, cell phones, PDA's, to laptop computers. Wireless ad-hoc networks are used to connect these heterogeneous nodes, each of which ...
Compiler-generated staggered checkpointing
To minimize work lost due to system failures, large parallel applications perform periodic checkpoints. These checkpoints are typically inserted manually by application programmers, resulting in synchronous checkpoints, or checkpoints that occur at the ...
Looking at the server side of peer-to-peer systems
Peer-to-peer systems have grown significantly in popularity over the last few years. An increasing number of research projects have been closely following this trend, looking at many of the paradigm's technical aspects. In the context of data-sharing ...
Dynamic topology adaptation of virtual networks of virtual machines
Virtual machine grid computing greatly simplifies the use of widespread computing resources by lowering the level of abstraction, benefiting both resource providers and users. For the user, the Virtuoso middleware that we are developing closely emulates ...
Replicating memory behavior for performance prediction
This paper introduces a method to monitor an application and generate a short synthetic "memory skeleton" program whose memory access pattern is representative of the application. In particular, the application and its memory skeleton should have ...
IMPuLSE: integrated monitoring and profiling for large-scale environments
A lack of efficient system software is an increasing impediment to deploying large-scale parallel and distributed systems. Systemically addressing operating system-induced performance anomalies requires accurate, low-overhead, whole-system monitoring, ...