Abstract
In order to keep an HPC cluster viable in terms of economy, serious cost limitations on the hardware and software deployment should be considered, prompting researchers to reconsider the design of modern HPC platforms. In this paper we present a cross-layer communication architecture suitable for emerging HPC platforms based on heterogeneous multiprocessors. We propose simple hardware primitives that enable protected, reliable and virtualized, user-level communication that can easily be integrate in the same package with the processing unit. Using an efficient user-space software stack the proposed architecture provides efficient, low-latency communication mechanisms to HPC applications. Our implementation of the MPI standard that exploits the aforementioned capabilities delivers point-to-point and collective primitives with low overheads, including an eager protocol with end-to-end latency of 1.4 \(\upmu \mathrm{s}\). We port and evaluate our communication stack using real HPC applications in a cluster of 128 ARMv8 processors that are tightly coupled with FPGA logic. The network interface primitives occupy less than 25% of the FPGA logic and only 3 Mbits of SRAM while they can easily saturate the 16 Gb/s links in our platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
System MMU in the case of ARM processors.
- 2.
Note that, at the hardware level, the transfer has been separately acknowledged from the RX engine on node B to the TX engine on node A.
References
LAMMPS Molecular Dynamics Simulator. Sandia National Laboratories. https://lammps.sandia.gov
The ExaNest project. European Exascale System Interconnect and Storage. GA-671553. www.exanest.eu
Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray xc series network. Cray Inc., White Paper WP-Aries01-1112 (2012)
Ammendola, R., et al.: Apenet: a high speed, low latency 3d interconnect network. In: cluster, p. 481. Citeseer (2004)
EuroEXA: European Exascale System Interconnect and Storage. https://euroexa.eu/
Feldman, M.: Fujitsu switches horses for post-k supercomputer, will ride arm into exascale. Recuperado de (2016). https://www.top500.org/news/fujitsu-switcheshorses-for-post-k-supercomputer-will-ride-arm-intoexascale
Fu, H., et al.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001 (2016)
HORIZON 2020: The EU Framework Programme for Research and Innovation. https://ec.europa.eu/programmes/horizon2020/
Katevenis, M., et al., N.C.: The exanest project: Interconnects, storage, and packaging for exascale systems. In: 2016 Euromicro Conference on Digital System Design (DSD), pp. 60–67, August 2016. https://doi.org/10.1109/DSD.2016.106
Katevenis, M.G.: Interprocessor communication seen as load-store instruction generalization. In: The Future of Computing, essays in memory of Stamatis Vassiliadis. In: Bertels, K., et al. (eds.) Delft, The Netherlands. Citeseer (2007)
Katz, R.H., Eggers, S.J., Wood, D.A., Perkins, C., Sheldon, R.G.: Implementing a cache consistency protocol, vol. 13. IEEE Computer Society Press (1985)
Leitao, B.H.: Tuning 10gb network cards on linux. In: Proceedings of the 2009 Linux Symposium, pp. 169–185. Citeseer (2009)
LAMMPS Benchmark suite. http://lammps.sandia.gov/bench.html
OSU Micro-Benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/
Pfister, G.F.: An introduction to the infiniband architecture. High Perform. Mass Storage Parallel I/O 42, 617–632 (2001)
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005). https://doi.org/10.1177/1094342005051521. http://dx.doi.org/10.1177/1094342005051521
Yokokawa, M., Shoji, F., Uno, A., Kurokawa, M., Watanabe, T.: The k computer: Japanese next-generation supercomputer development project. In: IEEE/ACM International Symposium on Low Power Electronics and Design, pp. 371–372. IEEE (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ploumidis, M. et al. (2019). Software and Hardware Co-design for Low-Power HPC Platforms. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)