It is our great pleasure to welcome you to the Third Edition of IA3 --- Workshop on Irregular Applications: Architectures and Algorithms, held in conjunction with SC'13.
Many data intensive and scientific applications are by nature irregular. They may present irregular data structures, irregular control flow or irregular communication. Current supercomputing systems are organized around components optimized for data locality and regular computation. Developing irregular applications on them demands a substantial effort, and often leads to poor performance. However, executing irregular applications efficiently will be a key requirement for future systems. As we are experiencing an exponential growth of unstructured data, new approaches and solutions to manage them are required.
The solutions needed to address irregular applications challenges can only come by considering the problem from all perspectives: from micro- to system-architectures, from compilers to languages, from libraries to runtimes, from algorithm design to data characteristics. Only collaborative efforts among researchers with different expertise, including end users, domain experts, and computer scientists, could lead to significant breakthroughs. This workshop continues in pursuing its objective of bringing together scientists with all these different backgrounds to discuss, define and design methods and technologies for efficiently supporting irregular applications on current and future systems.
Proceeding Downloads
A novel finite element method assembler for co-processors and accelerators
Finite element method (FEM) is a popular approach to solving Differential equations [5]. Among its many attractive features is its ability to handle complex geometries. The domain is discretised using simple elements whose local contributions are ...
The energy case for graph processing on hybrid CPU and GPU systems
This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to ...
A synthetic task model for HPC-grade optical network performance evaluation
With vastly increasing system parallelism, energy efficient data movement has emerged as one of the key challenges in High Performance Computing (HPC). Optics offers the potential for creating system-wide interconnection networks with extremely high ...
Maximizing the performance of irregular applications on multithreaded, NUMA systems
In modern shared-memory systems, the communication latency and available resources for a group of logical processors are determined by their relative position in the hierarchy of chips, cores, and hardware threads. Thus the performance of multithreaded ...
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
In-memory data compression for sparse matrices
We present a high performance in-memory lossless data compression scheme designed to save both memory storage and bandwidth for general sparse matrices. Because the storage hierarchy is increasingly becoming the limiting factor in overall delivered ...
On the GPU performance of cell-centered finite volume method over unstructured tetrahedral meshes
Finite volume methods are widely used numerical strategies for solving partial differential equations. This paper aims at obtaining a quantitative understanding of the achievable GPU performance of finite volume computations in the context of the cell-...
Nonzero pattern analysis and memory access optimization in GPU-based sparse LU factorization for circuit simulation
The sparse matrix solver is a critical component in circuit simulators. Some researches have developed GPU-based LU factorization approaches to accelerate the sparse solver. But the performance of these solvers is constrained by the irregularities of ...
Register level sort algorithm on multi-core SIMD processors
State-of-the-art hardware increasingly utilizes SIMD parallelism, where multiple processing elements execute the same instruction on multiple data points simultaneously. However, irregular and data intensive algorithms are not well suited for such ...
Parallel sparse FFT
The Fast Fourier Transform (FFT) is a widely used numerical algorithm. When N input data points lead to only k << N non-zero coefficients in the transformed domain, the algorithm is clearly inefficient: the FFT performs O(NlogN) operations on N input ...
A communications simulation methodology for AMR codes using task dependency analysis
The ability to predict the performance of irregular, asynchronous applications on future hardware is essential to the exascale co-design process. Adaptive Mesh Refinement (AMR) applications are inherently irregular and dynamic in their computation and ...
Parallel implementations of ensemble data assimilation for atmospheric prediction
Numerical models are used to find approximate solutions to the coupled nonlinear partial differential equations associated with the prediction of the atmosphere. The model state can be represented by a grid of discrete values; subsets of grid points are ...
Index Terms
- Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms