skip to main content
10.1145/2712386acmconferencesBook PagePublication PagesppoppConference Proceedingsconference-collections
PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores
ACM2015 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
PPoPP '15: 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming San Francisco California February 7 - 8, 2015
ISBN:
978-1-4503-3404-4
Published:
07 February 2015
Sponsors:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 03 Mar 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
research-article
Public Access
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers

In this paper we unveil some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. To do this, we consider state-of-the-art implementations of the sparse matrix-vector (SpMV) product in libraries like cuSPARSE, ...

research-article
Energy-efficient computing for HPC workloads on heterogeneous manycore chips

Power and energy efficiency is one of the major challenges to achieve exascale computing in the next several years. While chips operating at low voltages have been studied to be highly energy-efficient, low voltage operations lead to heterogeneity ...

research-article
A performance study of Java garbage collectors on multicore architectures

In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) represents a critical component of the JVM and has a significant influence on ...

research-article
Toward an evolutionary task parallel integrated MPI + X programming model

The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as tasks, to smooth out utilization of the computing resource, in particular the node ...

research-article
Design and evaluation of a novel dataflow based bigdata solution

As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful system is Hadoop ...

research-article
Programming support for reconfigurable custom vector architectures

High performance requirements increased the popularity of unconventional architectures. While providing better performance, such architectures are generally harder to program and generate code for. In this paper, we present our approach to ease ...

research-article
Public Access
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory ...

research-article
Parallelism vs. speculation: exploiting speculative genetic algorithm on GPU

Graphics Processing Unit (GPU) shows stunning computing power for scientific applications in the past few years, which attracts attention from both industry and academics. The huge number of cores means high parallelism and also powerful computation ...

research-article
GPU technology applied to reverse time migration and seismic modeling via OpenACC

GPU computing offers tremendous potential to accelerate complex scientific applications and is becoming a leading force in speeding up seismic imaging and velocity analysis techniques. Developing portable code is a challenge that can be overcome using ...

research-article
Parallelizing a discrete event simulation application using the Habanero-Java multicore library

Discrete event simulation (DES) has been widely adopted for simulating communication systems such as computer networks. As the network size and complexity of communication patterns increases, the complexity of simulation tools and the execution time of ...

research-article
RaftLib: a C++ template library for high performance stream parallel processing

Stream processing or data-flow programming is a compute paradigm that has been around for decades in many forms yet has failed garner the same attention as other mainstream languages and libraries (e.g., C++ or OpenMP [15]). Stream processing has great ...

research-article
A Java util concurrent park contention tool

Java Util Concurrent (JUC) is a widely used library in multithreaded Java applications. JUC provides a variety of tools such as locks, thread pools and blocking queues. Many of these constructs use Thread Park, a mechanism which allows threads to be ...

research-article
Debugging parallel programs using fork handlers

Nowadays multicore computers are easy to find everywhere, from mobile phones to high end servers. However, producing parallel programs that take advantage of these computers is not easy: parallel programs are error prone and finding these errors and ...

research-article
Effective communication for a system of cluster-on-a-chip processors

In this work, we analyze efficient communication methods for a grid of many-core processors in the absence of cache coherence. For this study, we build a multi-chip processor with 240 tightly connected cores and demonstrate its scalability. This ...

research-article
Public Access
Exploiting communication concurrency on high performance computing systems

Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in hybrid programming models such as SPMD+OpenMP. We present the design of a "...

research-article
Patty: a pattern-based parallelization tool for the multicore age

The free lunch of ever increasing clock frequencies is over. Performance-critical sequential software must be parallelized, and this is tedious, hard, buggy, knowledge-intensive, and time-consuming. In order to assist software engineers appropriately, ...

research-article
Deadlock-free buffer configuration for stream computing

Stream computing is a popular paradigm for parallel and distributed computing, which features computing nodes connected by first-in first-out (FIFO) data channels. To increase the efficiency of communication links and boost application throughput, ...

research-article
Supporting multiple accelerators in high-level programming models

Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in work-stations, servers and supercomputers for scientific and engineering applications. Efficiently exploiting the massive parallelism these ...

Cited By

    Contributors
    • Shanghai Jiao Tong University
    • University of Otago
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Acceptance Rates

    PMAM '15 Paper Acceptance Rate 19 of 34 submissions, 56%;
    Overall Acceptance Rate 53 of 97 submissions, 55%
    YearSubmittedAcceptedRate
    PMAM '2015853%
    PMAM'19171059%
    PMAM'1817953%
    PMAM'1714750%
    PMAM '15341956%
    Overall975355%