Export Citations
No abstract available.
Proceeding Downloads
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers
In this paper we unveil some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. To do this, we consider state-of-the-art implementations of the sparse matrix-vector (SpMV) product in libraries like cuSPARSE, ...
Energy-efficient computing for HPC workloads on heterogeneous manycore chips
Power and energy efficiency is one of the major challenges to achieve exascale computing in the next several years. While chips operating at low voltages have been studied to be highly energy-efficient, low voltage operations lead to heterogeneity ...
A performance study of Java garbage collectors on multicore architectures
In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) represents a critical component of the JVM and has a significant influence on ...
Toward an evolutionary task parallel integrated MPI + X programming model
- Richard F. Barrett,
- Dylan T. Stark,
- Courtenay T. Vaughan,
- Ryan E. Grant,
- Stephen L. Olivier,
- Kevin T. Pedretti
The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as tasks, to smooth out utilization of the computing resource, in particular the node ...
Design and evaluation of a novel dataflow based bigdata solution
As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful system is Hadoop ...
Programming support for reconfigurable custom vector architectures
High performance requirements increased the popularity of unconventional architectures. While providing better performance, such architectures are generally harder to program and generate code for. In this paper, we present our approach to ease ...
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory ...
Parallelism vs. speculation: exploiting speculative genetic algorithm on GPU
Graphics Processing Unit (GPU) shows stunning computing power for scientific applications in the past few years, which attracts attention from both industry and academics. The huge number of cores means high parallelism and also powerful computation ...
GPU technology applied to reverse time migration and seismic modeling via OpenACC
GPU computing offers tremendous potential to accelerate complex scientific applications and is becoming a leading force in speeding up seismic imaging and velocity analysis techniques. Developing portable code is a challenge that can be overcome using ...
Parallelizing a discrete event simulation application using the Habanero-Java multicore library
Discrete event simulation (DES) has been widely adopted for simulating communication systems such as computer networks. As the network size and complexity of communication patterns increases, the complexity of simulation tools and the execution time of ...
RaftLib: a C++ template library for high performance stream parallel processing
Stream processing or data-flow programming is a compute paradigm that has been around for decades in many forms yet has failed garner the same attention as other mainstream languages and libraries (e.g., C++ or OpenMP [15]). Stream processing has great ...
A Java util concurrent park contention tool
Java Util Concurrent (JUC) is a widely used library in multithreaded Java applications. JUC provides a variety of tools such as locks, thread pools and blocking queues. Many of these constructs use Thread Park, a mechanism which allows threads to be ...
Debugging parallel programs using fork handlers
Nowadays multicore computers are easy to find everywhere, from mobile phones to high end servers. However, producing parallel programs that take advantage of these computers is not easy: parallel programs are error prone and finding these errors and ...
Effective communication for a system of cluster-on-a-chip processors
In this work, we analyze efficient communication methods for a grid of many-core processors in the absence of cache coherence. For this study, we build a multi-chip processor with 240 tightly connected cores and demonstrate its scalability. This ...
Exploiting communication concurrency on high performance computing systems
Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in hybrid programming models such as SPMD+OpenMP. We present the design of a "...
Patty: a pattern-based parallelization tool for the multicore age
The free lunch of ever increasing clock frequencies is over. Performance-critical sequential software must be parallelized, and this is tedious, hard, buggy, knowledge-intensive, and time-consuming. In order to assist software engineers appropriately, ...
Deadlock-free buffer configuration for stream computing
Stream computing is a popular paradigm for parallel and distributed computing, which features computing nodes connected by first-in first-out (FIFO) data channels. To increase the efficiency of communication links and boost application throughput, ...
Supporting multiple accelerators in high-level programming models
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in work-stations, servers and supercomputers for scientific and engineering applications. Efficiently exploiting the massive parallelism these ...
Cited By
Index Terms
- Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores