Networking and communication challenges for post-exascale systems

Panda, Dhabaleswar; Lu, Xiao-Yi; Subramoni, Hari

doi:10.1631/FITEE.1800631

Networking and communication challenges for post-exascale systems

Perspective
Published: 28 November 2018

Volume 19, pages 1230–1235, (2018)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

244 Accesses
4 Citations
Explore all metrics

Abstract

With the significant advancement in emerging processor, memory, and networking technologies, exascale systems will become available in the next few years (2020–2022). As the exascale systems begin to be deployed and used, there will be a continuous demand to run next-generation applications with finer granularity, finer time-steps, and increased data sizes. Based on historical trends, next-generation applications will require postexascale systems during 2025–2035. In this study, we focus on the networking and communication challenges for post-exascale systems. Firstly, we present an envisioned architecture for post-exascale systems. Secondly, the challenges are summarized from different perspectives: heterogeneous networking technologies, high-performance communication and synchronization protocols, integrated support with accelerators and field-programmable gate arrays, fault-tolerance and quality-of-service support, energy-aware communication schemes and protocols, softwaredefined networking, and scalable communication protocols with heterogeneous memory and storage. Thirdly, we present the challenges in designing efficient programming model support for high-performance computing, big data, and deep learning on these systems. Finally, we emphasize the critical need for co-designing runtime with upper layers on these systems to achieve the maximum performance and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Status, challenges and trends of data-intensive supercomputing

Article 01 June 2022

Energy-Efficient Heterogeneous Computing at exaSCALE—ECOSCALE

Heterogeneous 3D Nano-systems: The N3XT Approach?

References

ASCAC Subcommittee on Exascale Computing, 2010. The Opportunities and Challenges of Exascale Computing. https://science.energy.gov/media/ascr/ascac/pdf/reports/Exascale_subcommittee_report.pdf
Google Scholar
Biswas R, Lu XY, Panda DK, 2018. Accelerating tensorFlow with adaptive RDMA–based gRPC. 25th IEEE Int Conf on High Performance Computing, Data, and Analytic.
Google Scholar
Cui YF, Moore R, Olsen K, et al., 2007. Enabling verylarge scale earthquake simulations on parallel machines. In: Shi Y, van Albada GD, Dongarra J, et al. (Eds.), Computational Science. Springer Berlin Heidelberg, p.46–53.
Google Scholar
Energy Government, 2011. Workshop on Terabits Networks for Extreme Scale Science. https://science.energy.gov//media/ascr/pdf/programdocuments/docs/Terabit_networks_workshop_report. pdf
Google Scholar
Graham RL, Bureddy D, Lui P, et al., 2016. Scalable Hierarchical Aggregation Protocol (SHArP): a hardware architecture for efficient data reduction. Proc 1st Workshop on Optimization of Communication in HPC, p.1–10. https://doi.org/10.1109/COMHPC.2016.006
Google Scholar
Intel, 2016. Intel Omni–Path Architecture Driving Exascale Computing and HPC. https://www.intel.com/content/www/us/en/highperformance–computing–fabrics/omni–path–drivingexascale–computing.html
Google Scholar
Li RZ, DeTar C, Gottlieb S, et al., 2017. MILC code performance on high end CPU and GPU supercomputer clusters. http://cn.arxiv.org/abs/1712.00143
Google Scholar
Lu XY, Shankar D, Gugnani S, et al., 2016. Highperformance design of Apache Spark with RDMA and its benefits on various workloads. Proc IEEE Int Conference on Big Data, p.253–262. https://doi.org/10.1109/BigData.2016.7840611
Google Scholar
Mellanox BlueField, 2017. Multicore System on Chip. http://www.mellanox.com/related–docs/npu–multicoreprocessors/PB_Bluefield_SoC.pdf NVMe Express, 2016. NVMe over Fabrics. http://www.nvmexpress.org/wp–content/uploads/NVMe_Over_Fabrics.pdf
Google Scholar
ORNL, 2018. Summit: America’s Newest and Smartest Supercomputer. https://www.olcf.ornl.gov/summit/
Google Scholar
Rahman MWU, Lu XY, Islam NS, et al., 2014. HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. Proc 28th ACM Int Conf on Supercomputing, p.33–42. https://doi.org/10.1145/2597652.2597684
Book Google Scholar
Rajachandrasekar R, Jaswani J, Subramoni H, et al., 2012. Minimizing network contention in Infini–Band clusters with a QoS–aware data–staging framework. IEEE Int Conf on Cluster Computing, p.329–336. https://doi.org/10.1109/CLUSTER.2012.90
Google Scholar
Sarvestani AMK, Bailey C, Austin J, 2018. Performance analysis of a 3D wireless massively parallel computer. J Sens Actuat Netw, 7(2):18. https://doi.org/10.3390/jsa.7020018
Article Google Scholar
Shankar D, Lu X, Islam N, et al., 2016. High–performance hybrid key–value store on modern clusters with RDMA interconnects and SSDs: non–blocking extensions, designs, and benefits. IEEE Int Parallel and Distributed Processing Symp, p.393–402. https://doi.org/10.1109/IPDPS.2016.112
Google Scholar
Subramoni H, Lai P, Sur S, et al., 2010. Improving application performance and predictability using multiple virtual lanes in modern multi–core InfiniBand clusters. Int Conf on Parallel Processing.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, Ohio, 43210, USA
Dhabaleswar Panda, Xiao-Yi Lu & Hari Subramoni

Authors

Dhabaleswar Panda
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Yi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hari Subramoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhabaleswar Panda.

Additional information

Project supported by the National Science Foundation of the USA (Nos. IIS-1447804 and CNS-1513120)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panda, D., Lu, XY. & Subramoni, H. Networking and communication challenges for post-exascale systems. Frontiers Inf Technol Electronic Eng 19, 1230–1235 (2018). https://doi.org/10.1631/FITEE.1800631

Download citation

Received: 09 October 2018
Revised: 15 October 2018
Accepted: 15 October 2018
Published: 28 November 2018
Issue Date: October 2018
DOI: https://doi.org/10.1631/FITEE.1800631

Key words

CLC number

TP311

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Networking and communication challenges for post-exascale systems

Abstract

Access this article

Similar content being viewed by others

Status, challenges and trends of data-intensive supercomputing

Energy-Efficient Heterogeneous Computing at exaSCALE—ECOSCALE

Heterogeneous 3D Nano-systems: The N3XT Approach?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Networking and communication challenges for post-exascale systems

Abstract

Access this article

Similar content being viewed by others

Status, challenges and trends of data-intensive supercomputing

Energy-Efficient Heterogeneous Computing at exaSCALE—ECOSCALE

Heterogeneous 3D Nano-systems: The N3XT Approach?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation