PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction

Dai, Yi; Wu, Ke; Lai, Mingche; Li, Qiong; Dong, Dezun

doi:10.1007/978-3-030-38991-8_1

Yi Dai¹¹,
Ke Wu¹¹,
Mingche Lai¹¹,
Qiong Li¹¹ &
…
Dezun Dong¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11944))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1738 Accesses

Abstract

Interconnect networks increasingly bottleneck the performance of datacenters and HPC due to ever-increasing communication overhead. High-radix switches are widely deployed in interconnection networks to achieve higher throughput and lower latency. However, network latency could be greatly deteriorated due to traffic burst and micro-burst features. In this paper, we propose a Prefetch and prediction based Switch (PPS) which can effectively reduce the packet delay and eliminate the effect of traffic burst. By using dynamic allocation multiple queueing (DAMQ) buffer with data prefetch, PPS implements concurrent write and read with zero-delay, thus implementing full pipeline of the packet scheduling. We further propose a simple but efficient arbitration scheme, which completes a packet arbitration within one clock cycle meanwhile maintaining higher throughput. Moreover, by predicting the arbitration results and filtering the potential failed requests in the next round, our scheduling algorithm demonstrates indistinguishable performance from the iSLIP, but with nearly half of the iSLIP’s area and 36.37% less logic units (LUTs). Attributing to the optimal schemes of DAMQ with control data prefetch and two-level scheduling with arbitration prediction, PPS achieves low-latency and high throughput. Also, PPS can easily extend the switching logic to a higher radix for the hardware complexity grows linearly with the number of ports.

This research was supported by 863 Program of China (2018YFB2202303, 2016YFB0200200), NSFC (61972412, 61832018), the national pre-research project (31511010202).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Per-packet global congestion estimation for fast packet delivery in networks-on-chip

Article 10 May 2015

HTPA: a hybrid traffic pattern aware arbitration strategy for network on chip systems

Article 27 May 2024

Microarchitecture of a Configurable High-Radix Router for the Post-Moore Era

References

Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Google Scholar
Chrysos, N., Minkenberg, C., Rudquist, M., et al.: SCOC: high-radix switches made of bufferless clos networks. In: IEEE International Symposium on High Performance Computer Architecture, pp. 402–414. IEEE (2015)
Google Scholar
Kim, J., Dally, W.J., Towles, B., et al.: Microarchitecture of a high-radix router. ACM SIGARCH Comput. Archit. News 33(2), 420–431 (2005)
Article Google Scholar
Ahn, J.H., Choo, S., Kim, J.: Network within a network approach to create a scalable high-radix router microarchitecture. In: IEEE International Symposium on High Performance Computer Architecture, pp. 1–12. IEEE (2012)
Google Scholar
Vicente, A.M., Apostolopoulos, G., Alfaro, F.J., et al.: Efficient deadline-based QoS algorithms for high-performance networks. IEEE Trans. Comput. 57(7), 928–939 (2008)
Article MathSciNet MATH Google Scholar
Ebrahimi, M., Daneshtalab, M.: EbDa: a new theory on design and verification of deadlock-free interconnection networks. In: International Symposium on Computer Architecture, pp. 703–715. ACM (2017)
Article Google Scholar
InfiniBand Trade Association: InfiniBandTM Architecture Specification Volume 1, Release 1.0, October 2000. www.infinibandta.org
Mckeown, N.: The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Trans. Networking 7(2), 188–201 (1999)
Article Google Scholar
Karol, M., Hluchyj, M., Morgan, S.: Input versus output queueing on a space-division packet switch. IEEE Trans. Commun. COM–35(12), 1347–1356 (1987)
Article Google Scholar
Zhang, H., Wang, K., Dai, Y., et al.: A multi-VC dynamically shared buffer with prefetch for network on chip. In: IEEE International Conference on Networking, Architecture and Storage, pp. 320–327. IEEE (2012)
Google Scholar
Ni, L.M., Mckinley, P.K.: A survey of wormhole routing techniques in direct networks. Computer 26(2), 62–76 (1993)
Article Google Scholar
Duato, J.: A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Trans. Parallel Distrib. Syst. 7(8), 841–854 (1996)
Article Google Scholar
Kermani, P., Kleinrock, L.: Virtual cut-through: a new computer communication switching technique. Comput. Netw. 66(4), 4–17 (2014)
Article MathSciNet MATH Google Scholar
Chen, L., Pinkston, T.M.: Worm-bubble flow control. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA 2013), pp. 366–377. IEEE (2013)
Google Scholar
Abts, D., Abts, D., Kim, J., et al.: The BlackWidow high-radix clos network. In: International Symposium on Computer Architecture, pp. 16–28. IEEE (2006)
Google Scholar
Dai, Y., Lu, K., Xiao, L., et al.: A cost-efficient router architecture for HPC inter-connection networks: design and implementation. IEEE Trans. Parallel Distrib. Syst. 30(4), 738–753 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Yi Dai, Ke Wu, Mingche Lai, Qiong Li & Dezun Dong

Authors

Yi Dai
View author publications
You can also search for this author in PubMed Google Scholar
Ke Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mingche Lai
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Li
View author publications
You can also search for this author in PubMed Google Scholar
Dezun Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Dai .

Editor information

Editors and Affiliations

Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Melbourne, VIC, Australia
Sheng Wen
School of Computer Science, The University of Sydney, Camperdown, NSW, Australia
Albert Zomaya
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, Y., Wu, K., Lai, M., Li, Q., Dong, D. (2020). PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction. In: Wen, S., Zomaya, A., Yang, L. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11944. Springer, Cham. https://doi.org/10.1007/978-3-030-38991-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-38991-8_1
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38990-1
Online ISBN: 978-3-030-38991-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Per-packet global congestion estimation for fast packet delivery in networks-on-chip

HTPA: a hybrid traffic pattern aware arbitration strategy for network on chip systems

Microarchitecture of a Configurable High-Radix Router for the Post-Moore Era

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Per-packet global congestion estimation for fast packet delivery in networks-on-chip

HTPA: a hybrid traffic pattern aware arbitration strategy for network on chip systems

Microarchitecture of a Configurable High-Radix Router for the Post-Moore Era

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation