Abstract
Interconnect networks increasingly bottleneck the performance of datacenters and HPC due to ever-increasing communication overhead. High-radix switches are widely deployed in interconnection networks to achieve higher throughput and lower latency. However, network latency could be greatly deteriorated due to traffic burst and micro-burst features. In this paper, we propose a Prefetch and prediction based Switch (PPS) which can effectively reduce the packet delay and eliminate the effect of traffic burst. By using dynamic allocation multiple queueing (DAMQ) buffer with data prefetch, PPS implements concurrent write and read with zero-delay, thus implementing full pipeline of the packet scheduling. We further propose a simple but efficient arbitration scheme, which completes a packet arbitration within one clock cycle meanwhile maintaining higher throughput. Moreover, by predicting the arbitration results and filtering the potential failed requests in the next round, our scheduling algorithm demonstrates indistinguishable performance from the iSLIP, but with nearly half of the iSLIP’s area and 36.37% less logic units (LUTs). Attributing to the optimal schemes of DAMQ with control data prefetch and two-level scheduling with arbitration prediction, PPS achieves low-latency and high throughput. Also, PPS can easily extend the switching logic to a higher radix for the hardware complexity grows linearly with the number of ports.
This research was supported by 863 Program of China (2018YFB2202303, 2016YFB0200200), NSFC (61972412, 61832018), the national pre-research project (31511010202).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Chrysos, N., Minkenberg, C., Rudquist, M., et al.: SCOC: high-radix switches made of bufferless clos networks. In: IEEE International Symposium on High Performance Computer Architecture, pp. 402–414. IEEE (2015)
Kim, J., Dally, W.J., Towles, B., et al.: Microarchitecture of a high-radix router. ACM SIGARCH Comput. Archit. News 33(2), 420–431 (2005)
Ahn, J.H., Choo, S., Kim, J.: Network within a network approach to create a scalable high-radix router microarchitecture. In: IEEE International Symposium on High Performance Computer Architecture, pp. 1–12. IEEE (2012)
Vicente, A.M., Apostolopoulos, G., Alfaro, F.J., et al.: Efficient deadline-based QoS algorithms for high-performance networks. IEEE Trans. Comput. 57(7), 928–939 (2008)
Ebrahimi, M., Daneshtalab, M.: EbDa: a new theory on design and verification of deadlock-free interconnection networks. In: International Symposium on Computer Architecture, pp. 703–715. ACM (2017)
InfiniBand Trade Association: InfiniBandTM Architecture Specification Volume 1, Release 1.0, October 2000. www.infinibandta.org
Mckeown, N.: The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Trans. Networking 7(2), 188–201 (1999)
Karol, M., Hluchyj, M., Morgan, S.: Input versus output queueing on a space-division packet switch. IEEE Trans. Commun. COM–35(12), 1347–1356 (1987)
Zhang, H., Wang, K., Dai, Y., et al.: A multi-VC dynamically shared buffer with prefetch for network on chip. In: IEEE International Conference on Networking, Architecture and Storage, pp. 320–327. IEEE (2012)
Ni, L.M., Mckinley, P.K.: A survey of wormhole routing techniques in direct networks. Computer 26(2), 62–76 (1993)
Duato, J.: A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Trans. Parallel Distrib. Syst. 7(8), 841–854 (1996)
Kermani, P., Kleinrock, L.: Virtual cut-through: a new computer communication switching technique. Comput. Netw. 66(4), 4–17 (2014)
Chen, L., Pinkston, T.M.: Worm-bubble flow control. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA 2013), pp. 366–377. IEEE (2013)
Abts, D., Abts, D., Kim, J., et al.: The BlackWidow high-radix clos network. In: International Symposium on Computer Architecture, pp. 16–28. IEEE (2006)
Dai, Y., Lu, K., Xiao, L., et al.: A cost-efficient router architecture for HPC inter-connection networks: design and implementation. IEEE Trans. Parallel Distrib. Syst. 30(4), 738–753 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dai, Y., Wu, K., Lai, M., Li, Q., Dong, D. (2020). PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction. In: Wen, S., Zomaya, A., Yang, L. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11944. Springer, Cham. https://doi.org/10.1007/978-3-030-38991-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-38991-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38990-1
Online ISBN: 978-3-030-38991-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)