Running HPC applications on many million cores Cloud | IEEE Conference Publication | IEEE Xplore

Running HPC applications on many million cores Cloud


Abstract:

Despite the various hardware and software improvements in Cloud architecture, there still exists the huge performance gap between the commodity supercomputers and Cloud w...Show More

Abstract:

Despite the various hardware and software improvements in Cloud architecture, there still exists the huge performance gap between the commodity supercomputers and Cloud when running HPC communication intensive applications. In order to find what is preventing them to better scale on Cloud, we evaluated HPL and NAMD benchmarks on HPE Openstack testbed, and NAMD benchmarks on supercomputer located at Rijeka University Supercomputing Center. Our results revealed two major bottlenecks: the throughput of the interconnect, and Cloud orchestration layer, among other responsible for the management of the communication between Cloud instances. We investigated the influence of jittering, but did not find the significant influence on performance. Our conclusion is that by solely increasing the interconnect throughput, one will not improve the scalability of HPC communication intensive HPC applications in Cloud. This is also backed up with NAMD performed at HP Labs, and with HPL benchmark performed at San Diego Supercomputing Center. We propose two possible scenarios of scalability improvements. One with distributed model of Cloud Orchestration layer; another with bare metal containers. Efficient load balancing remains the must if we want to see HPC applications scaling over many million Cloud cores. For this, we propose novel SLEM based load balancing strategy.
Date of Conference: 22-26 May 2017
Date Added to IEEE Xplore: 13 July 2017
ISBN Information:
Conference Location: Opatija, Croatia

References

References is not available for this document.