Abstract:
We propose a novel high performance computing (HPC) network architecture \mathrm {HFOS}_{L} based on L parallel levels distributed low radix fast optical switches...Show MoreMetadata
Abstract:
We propose a novel high performance computing (HPC) network architecture \mathrm {HFOS}_{L} based on L parallel levels distributed low radix fast optical switches (FOS). We provide a detailed description of the blade, FOS and the operation of the \mathrm {HFOS}_{L} network. The \mathrm {HFOS}_{L} HPC network is highly scalable, and HFOS4 architecture can support an extremely large HPC network of 65,536 blades under distributed FOS with the same radix of 16 in each level. In principle, the \mathrm {HFOS}_{L} HPC network can be built by FOSes with different radices at each level. To find out the best configuration of FOS at each level and solve the energy and cost optimization problem in \mathrm {HFOS}_{L} network, we break down all the components in the FOS and develop the energy and cost models for the FOS. We verify that the energy and cost per radix functions of FOS are convex functions. Given this foundation, the theoretical investigation of the energy and cost optimization problem shows that the \mathrm {HFOS}_{L} network could achieve the minimum energy and cost only when the FOS radices of all levels in \mathrm {HFOS}_{L} network are the same. Besides, the cost and power consumption of \mathrm {HFOS}_{L} networks are compared with a widely used Leaf-Spine network.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 1, February 2024)