Skip to main content

Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips

  • Conference paper
  • First Online:
Architecture of Computing Systems – ARCS 2018 (ARCS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10793))

Included in the following conference series:

  • 1673 Accesses

Abstract

Buffer overflows are a serious problem when running message-passing programs on network-on-chip based many-core processors. A simple synchronization mechanism ensures that data is transferred when nodes need it. Thereby, it avoids full buffers and interruption at any other time. However, software synchronization is not able to completely achieve these objectives, because its flits may still interrupt nodes or fill buffers. Therefore, we propose a lightweight hardware synchronization. It requires only small architectural changes as it comprises only very small components and it scales well. For controlling our hardware supported synchronization, we add two new assembler instructions. Furthermore, we show the difference in the software development process and evaluate the impact on the execution time of global communication operations and required receive buffer slots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    At distributed systems, there is plenty of buffer space because the main memory and swap space (hard disk) may be employed.

  2. 2.

    When implementing ready synchronization in software, a particular payload is defined to represent ready flits. In the hardware implementation, a payload is not possible because ready flits do not reach the processing element of a node.

  3. 3.

    For example, when a send operation takes 100 cycles and a receive operation takes 105 cycles, it takes 20 sends (2000 cycles) to permanently occupy one more buffer slot.

  4. 4.

    At some architectures, this might be solved with a header flit. However, not all architectures support this approach, see for example the RC/MC architecture [10].

  5. 5.

    They can be downloaded at www.github.com/unia-sik/rcmc.

  6. 6.

    A broadcast operation with one flit would result in numbers similar to the Barrier and All-to-All broadcast. Therefore, we took a larger broadcast to give an idea about what happens when lots of data is transmitted.

  7. 7.

    Altera uses the term Logic Element for their elementary logic block, basically a lookup table with 4 inputs and 1 output (4-LUT).

References

  1. Agarwal, A., Iskander, C., Shankar, R.: Survey of network on chip (NoC) architectures & contributions. J. Eng. Comput. Archit. 3(1), 21–27 (2009)

    Google Scholar 

  2. Bjerregaard, T., Mahadevan, S.: A survey of research and practices of network-on-chip. ACM Comput. Surv. (CSUR) 38(1), 1–51 (2006)

    Article  Google Scholar 

  3. Borkar, S.: Future of interconnect fabric: a contrarian view. In: Workshop on System Level Interconnect Prediction, SLIP 2010, pp. 1–2 (2010)

    Google Scholar 

  4. Chrysos, G.: Intel® Xeon Phi coprocessor (codename knights corner). In: Hot Chips 24 Symposium (HCS), 2012 IEEE, pp. 1–31. IEEE (2012)

    Google Scholar 

  5. Coenen, M., Murali, S., Ruadulescu, A., Goossens, K., De Micheli, G.: A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ ISSS 2006, pp. 130–135. IEEE (2006)

    Google Scholar 

  6. Goossens, K., Dielissen, J., Radulescu, A.: Æthereal network on chip: concepts, architectures, and implementations. IEEE Design Test Comput. 22(5), 414–421 (2005)

    Article  Google Scholar 

  7. Kung, H.T., Morris, R.: Credit-based flow control for ATM networks. IEEE Netw. 9(2), 40–48 (1995)

    Article  Google Scholar 

  8. Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach. Pearson, London (2012)

    Google Scholar 

  9. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.1. High Performance Computing Center Stuttgart (HLRS) (2015). http://mpi-forum.org/docs/mpi-3.1/mpi31-report-book.pdf

  10. Mische, J., Frieb, M., Stegmeier, A., Ungerer, T.: Reduced complexity many-core: timing predictability due to message-passing. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds.) ARCS 2017. LNCS, vol. 10172, pp. 139–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54999-6_11

    Chapter  Google Scholar 

  11. Mische, J., Ungerer, T.: Low power flitwise routing in an unidirectional torus with minimal buffering. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 63–68. ACM, New York (2012)

    Google Scholar 

  12. Mische, J., Ungerer, T.: Guaranteed service independent of the task placement in NoCs with torus topology. In: Proceedings of the 22nd International Conference on Real-Time Networks and Systems, RTNS 2014, pp. 151–160. ACM, New York (2014)

    Google Scholar 

  13. Rattner, J.: An experimental many-core processor from Intel Labs. Presentation (2010). http://download.intel.com/pressroom/pdf/rockcreek/SCC_Announcement_JustinRattner.pdf

  14. Raynal, M., Helary, J.M.: Synchronization and Control of Distributed Systems and Programs. Wiley Series in Parallel Computing. Wiley, Chichester (1990). (Trans: Synchronisation et contrôle des systèmes et des programmes réparties, Paris, Eyrolles). http://cds.cern.ch/record/223733

  15. Tanenbaum, A.S., Van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Upper Saddle River (2007)

    MATH  Google Scholar 

  16. Tanenbaum, A.S., Wetherall, D.J.: Computer Networks. Pearson, London (2010)

    Google Scholar 

  17. Vangal, S.R., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Singh, A., Jacob, T., Jain, S., Erraguntla, V., Roberts, C., Hoskote, Y., Borkar, N., Borkar, S.: An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43(1), 29–41 (2008)

    Article  Google Scholar 

  18. Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown III, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007)

    Article  Google Scholar 

Download references

Acknowledgement

The authors thank Ingo Sewing for his efforts implementing our lightweight hardware synchronization in the RC/MC architecture.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Frieb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frieb, M., Stegmeier, A., Mische, J., Ungerer, T. (2018). Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2018. ARCS 2018. Lecture Notes in Computer Science(), vol 10793. Springer, Cham. https://doi.org/10.1007/978-3-319-77610-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77610-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77609-5

  • Online ISBN: 978-3-319-77610-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics