Skip to main content

PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction

  • Conference paper
  • First Online:
  • 1579 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11944))

Abstract

Interconnect networks increasingly bottleneck the performance of datacenters and HPC due to ever-increasing communication overhead. High-radix switches are widely deployed in interconnection networks to achieve higher throughput and lower latency. However, network latency could be greatly deteriorated due to traffic burst and micro-burst features. In this paper, we propose a Prefetch and prediction based Switch (PPS) which can effectively reduce the packet delay and eliminate the effect of traffic burst. By using dynamic allocation multiple queueing (DAMQ) buffer with data prefetch, PPS implements concurrent write and read with zero-delay, thus implementing full pipeline of the packet scheduling. We further propose a simple but efficient arbitration scheme, which completes a packet arbitration within one clock cycle meanwhile maintaining higher throughput. Moreover, by predicting the arbitration results and filtering the potential failed requests in the next round, our scheduling algorithm demonstrates indistinguishable performance from the iSLIP, but with nearly half of the iSLIP’s area and 36.37% less logic units (LUTs). Attributing to the optimal schemes of DAMQ with control data prefetch and two-level scheduling with arbitration prediction, PPS achieves low-latency and high throughput. Also, PPS can easily extend the switching logic to a higher radix for the hardware complexity grows linearly with the number of ports.

This research was supported by 863 Program of China (2018YFB2202303, 2016YFB0200200), NSFC (61972412, 61832018), the national pre-research project (31511010202).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Dally, W., Towles, B.: Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  2. Chrysos, N., Minkenberg, C., Rudquist, M., et al.: SCOC: high-radix switches made of bufferless clos networks. In: IEEE International Symposium on High Performance Computer Architecture, pp. 402–414. IEEE (2015)

    Google Scholar 

  3. Kim, J., Dally, W.J., Towles, B., et al.: Microarchitecture of a high-radix router. ACM SIGARCH Comput. Archit. News 33(2), 420–431 (2005)

    Article  Google Scholar 

  4. Ahn, J.H., Choo, S., Kim, J.: Network within a network approach to create a scalable high-radix router microarchitecture. In: IEEE International Symposium on High Performance Computer Architecture, pp. 1–12. IEEE (2012)

    Google Scholar 

  5. Vicente, A.M., Apostolopoulos, G., Alfaro, F.J., et al.: Efficient deadline-based QoS algorithms for high-performance networks. IEEE Trans. Comput. 57(7), 928–939 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ebrahimi, M., Daneshtalab, M.: EbDa: a new theory on design and verification of deadlock-free interconnection networks. In: International Symposium on Computer Architecture, pp. 703–715. ACM (2017)

    Article  Google Scholar 

  7. InfiniBand Trade Association: InfiniBandTM Architecture Specification Volume 1, Release 1.0, October 2000. www.infinibandta.org

  8. Mckeown, N.: The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Trans. Networking 7(2), 188–201 (1999)

    Article  Google Scholar 

  9. Karol, M., Hluchyj, M., Morgan, S.: Input versus output queueing on a space-division packet switch. IEEE Trans. Commun. COM–35(12), 1347–1356 (1987)

    Article  Google Scholar 

  10. Zhang, H., Wang, K., Dai, Y., et al.: A multi-VC dynamically shared buffer with prefetch for network on chip. In: IEEE International Conference on Networking, Architecture and Storage, pp. 320–327. IEEE (2012)

    Google Scholar 

  11. Ni, L.M., Mckinley, P.K.: A survey of wormhole routing techniques in direct networks. Computer 26(2), 62–76 (1993)

    Article  Google Scholar 

  12. Duato, J.: A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Trans. Parallel Distrib. Syst. 7(8), 841–854 (1996)

    Article  Google Scholar 

  13. Kermani, P., Kleinrock, L.: Virtual cut-through: a new computer communication switching technique. Comput. Netw. 66(4), 4–17 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chen, L., Pinkston, T.M.: Worm-bubble flow control. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA 2013), pp. 366–377. IEEE (2013)

    Google Scholar 

  15. Abts, D., Abts, D., Kim, J., et al.: The BlackWidow high-radix clos network. In: International Symposium on Computer Architecture, pp. 16–28. IEEE (2006)

    Google Scholar 

  16. Dai, Y., Lu, K., Xiao, L., et al.: A cost-efficient router architecture for HPC inter-connection networks: design and implementation. IEEE Trans. Parallel Distrib. Syst. 30(4), 738–753 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dai, Y., Wu, K., Lai, M., Li, Q., Dong, D. (2020). PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction. In: Wen, S., Zomaya, A., Yang, L. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11944. Springer, Cham. https://doi.org/10.1007/978-3-030-38991-8_1

Download citation

Publish with us

Policies and ethics