skip to main content
10.1145/2966884acmotherconferencesBook PagePublication PageseurompiConference Proceedingsconference-collections
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
ACM2016 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
EuroMPI 2016: The 23rd European MPI Users' Group Meeting Edinburgh United Kingdom September 25 - 28, 2016
ISBN:
978-1-4503-4234-6
Published:
25 September 2016
In-Cooperation:

Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
SESSION: Overall Winner
research-article
Towards millions of communicating threads

We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can efficiently support millions of concurrently communicating light-weight threads using send-receive communication.

SESSION: Runner-up Winners
research-article
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning

Emerging paradigms like High Performance Data Analytics (HPDA) and Deep Learning (DL) pose at least two new design challenges for existing MPI runtimes. First, these paradigms require an efficient support for communicating unusually large messages ...

research-article
Generalisation of Recursive Doubling for AllReduce

The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2 N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange ...

SESSION: Scalability and the Road to Exascale
research-article
Space Performance Tradeoffs in Compressing MPI Group Data Structures

MPI is a popular programming paradigm on parallel machines today. MPI libraries sometimes use O(N) data structures to implement MPI functionality. The IBM Blue Gene/Q machine has 16 GB memory per node. If each node runs 32 MPI processes, only 512 MB is ...

research-article
Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test

The "postal" model of communication [3, 8] T = α + βn, for sending n bytes of data between two processes with latency α and bandwidth 1/β, is perhaps the most commonly used communication performance model in parallel computing. This performance model is ...

research-article
Introducing Task-Containers as an Alternative to Runtime-Stacking

The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that MPI will have to evolve in order to exploit shared-memory parallelism, either by ...

SESSION: Fault tolerance
research-article
The MIG Framework: Enabling Transparent Process Migration in Open MPI

This paper introduces the mig framework: an Open MPI extension to transparently support the migration of application processes, over different nodes of a distributed High-Performance Computing (HPC) system. The framework provides mechanism on top of ...

research-article
Architecting Malleable MPI Applications for Priority-driven Adaptive Scheduling

Future supercomputers will need to support both traditional HPC applications and Big Data/High Performance Analysis applications seamlessly in a common environment. This motivates traditional job scheduling systems to support malleable jobs along with ...

research-article
Infrastructure and API Extensions for Elastic Execution of MPI Applications

Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an ...

SESSION: Challenges and Extensions
research-article
A Library for Advanced Datatype Programming

We present a library providing functionality beyond the MPI standard for manipulating application data layouts described by MPI derived datatypes. The main contributions are: a) Constructors for several, new datatypes for describing application relevant ...

research-article
On the Expected and Observed Communication Performance with MPI Derived Datatypes

We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, "raw" performance of communicating simple, noncontiguous data layouts. We show that common MPI libraries sometimes violate these ...

research-article
Public Access
MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale

MPI includes all processes in MPI_COMM_WORLD; this is untenable for reasons of scale, resiliency, and overhead. This paper offers a new approach, extending MPI with a new concept called Sessions, which makes two key contributions: a tighter integration ...

SESSION: Parallel Applications using MPI
research-article
Distributed Memory Implementation Strategies for the kinetic Monte Carlo Algorithm

This paper presents strategies to parallelize a previously implemented kinetic Monte Carlo (kMC) algorithm. The process under simulation is the precipitation in an aluminum scandium alloy. The selected parallel algorithm is called synchronous parallel ...

research-article
How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms

Scientific workloads running on current extreme-scale systems routinely generate tremendous volumes of data for postprocessing. This data movement has become a serious issue due to its energy cost and the fact that I/O bandwidths have not kept pace with ...

short-paper
The Potential of Diffusive Load Balancing at Large Scale

Dynamic load balancing with diffusive methods is known to provide minimal load transfer and requires communication between neighbor nodes only. These are very attractive properties for highly parallel systems. We compare diffusive methods with state-of-...

SESSION: Single-sided RDMA
research-article
Optimization of Message Passing Services on POWER8 InfiniBand Clusters

We present scalability and performance enhancements to MPI libraries on POWER8 InfiniBand clusters. We explore optimizations in the Parallel Active Messaging Interface (PAMI) libraries. We bypass IB VERBS via low level inline calls resulting in low ...

research-article
Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All

The MPI all-to-all algorithm is a data intensive, high-cost collective algorithm used by many scientific High Performance Computing applications. Optimizations for small data exchange use aggregation techniques, such as the Bruck algorithm, to minimize ...

short-paper
Revisiting RDMA Buffer Registration in the Context of Lightweight Multi-kernels

Lightweight multi-kernel architectures, where HPC specialized lightweight kernels (LWKs) run side-by-side with Linux on compute nodes, have received a great deal of attention recently due to their potential for addressing many of the challenges system ...

short-paper
An Evaluation of the One-Sided Performance in Open MPI

Open MPI provides an implementation of the MPI-3.1 standard supporting native communication over a wide range of high-performance network interconnects. As of version 2.0.0 Open MPI provides two implementations of the MPI-3.1 Remote Memory Access (RMA) ...

SESSION: Tools
research-article
Runtime Correctness Analysis of MPI-3 Nonblocking Collectives

The Message Passing Interface (MPI) includes nonblocking collective operations that support additional overlap between computation and communication. These new operations enable complex data movement between large numbers of processes. However, their ...

research-article
CAF Events Implementation Using MPI-3 Capabilities

MPI-3.1 is currently the most recent version of the MPI standard. It adds important extensions to MPI-2, including a simplified semantic for the one-sided communication routines and a new tool interface, capable of exposing performance data of the MPI ...

short-paper
Allowing MPI tools builders to forget about Fortran

C tool writers are forced to deal with a number of Fortran and C interoperability issues when intercepting MPI routines and completing them with PMPI. The C based tool has to intercept the Fortran MPI routines and marshal arguments between C and Fortran,...

POSTER SESSION: Posters
poster
FFT data distribution in plane-waves DFT codes. A case study from Quantum ESPRESSO

Density Functional Theory calculations with plane waves and pseudopotentials represent one of the most important simulation techniques in high performance computing. Together with parallel linear algebra (ZGEMM and matrix diagonalization), the most ...

poster
Optimizing PARSEC for Knights Landing

PARSEC is a massively parallel Density-Functional-Theory (DFT) code. Within the modernization effort towards the new Intel Knights Landing platform, we adapted the main computational kernel, represented as high-order finite-difference stencils, to use ...

poster
Effective Calculation with Halo communication using Halo Functions

The issue of halo communication is the decrease of parallel scalability. To overcome the issues, we have introduced "Halo thread" to our simulation code. However, we have not solved the issue basically in the strong scaling. In this study, we have ...

poster
Public Access
MPI usage at NERSC: Present and Future

In this poster, we describe how MPI is used at the National Energy Research Scientific Computing Center (NERSC) NERSC is the production high-performance computing center for the US Department of Energy, with more than 5000 users and 800 distinct ...

poster
Performance comparison of Eulerian kinetic Vlasov code between flat-MPI parallelism and hybrid parallelism on Fujitsu FX100 supercomputer

The present study deals with the Vlasov simulation code, which solves the first-principle kinetic equations called the Vlasov equation for space plasma. In the present study, a five-dimensional Vlasov code with two spatial dimension and three velocity ...

Contributors
  • The University of Tennessee, Knoxville
  • Vienna University of Technology

Recommendations

Acceptance Rates

Overall Acceptance Rate66of139submissions,47%
YearSubmittedAcceptedRate
EuroMPI '19261350%
EuroMPI '17371746%
EuroMPI '15291448%
EuroMPI '13472247%
Overall1396647%