Abstract:
Currently, most scientific applications based on MPI adopt a compute-centric architecture. Needed data is accessed by MPI processes running on different nodes through a s...Show MoreMetadata
Abstract:
Currently, most scientific applications based on MPI adopt a compute-centric architecture. Needed data is accessed by MPI processes running on different nodes through a shared file system. Unfortunately, the explosive growth of scientific data undermines the high performance of MPI-based applications, especially in the execution environment of commodity clusters. In this paper, we present a novel approach to enable data locality computation for MPI-based data-intensive applications and refer to it as DL-MPI. DL-MPI allows MPI-based programs to obtain data distribution information for compute nodes through a novel data locality API. In addition, the problem of allocating data processing tasks to parallel processes is formulated as an integer optimization problem with the objectives of achieving data locality computation and optimal parallel execution time. For heterogeneous runtime environments, we propose a scheduling algorithm based on probability to dynamically schedule tasks to processes by evaluating the unprocessed local data and the computing ability of each compute node. We demonstrate the functionality of our methods through the implementation of scientific data processing programs as well as the incorporation of DL-MPI with existing HPC applications.
Published in: 2013 IEEE International Conference on Big Data
Date of Conference: 06-09 October 2013
Date Added to IEEE Xplore: 23 December 2013
Electronic ISBN:978-1-4799-1293-3