Abstract:
Intel Many Integrated Core (MIC) architectures are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per w...Show MoreMetadata
Abstract:
Intel Many Integrated Core (MIC) architectures are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. Partitioned Global Address Space (PGAS) programming models, such as OpenSHMEM, provide an attractive approach for developing scientific applications with irregular communication characteristics, by abstracting shared memory address space, along with one-sided communication semantics. However, the current OpenSHMEM standard does not efficiently support heterogeneous memory architectures such as Xeon Phi. Host and Xeon Phi cores have different memory capacities and compute characteristics. But, the global symmetric memory allocation in the current OpenSHMEM standard mandates that same amount of memory be allocated on every process. In this paper, we propose extensions to overcome this restriction and propose high performance runtime-level designs for efficient communication involving Xeon Phi processors. Further, we re-design applications to demonstrate the effectiveness of the proposed designs and extensions. Experimental evaluations indicate 4X to 7X reduction in OpenSHMEM data movement operation latencies, and 6X to 11X improvement in performance for collective operations. Application evaluations in symmetric mode indicate performance improvements of 28% at 1,024 processes. Further, application redesigns using the proposed extensions provide several magnitudes of performance improvement, as compared to the symmetric mode. To the best of our knowledge, this is the first research work that proposes high performance runtime designs for OpenSHMEM on Intel Xeon Phi clusters.
Date of Conference: 22-26 September 2014
Date Added to IEEE Xplore: 01 December 2014
Electronic ISBN:978-1-4799-5548-0