Abstract
Buffer overflows are a serious problem when running message-passing programs on network-on-chip based many-core processors. A simple synchronization mechanism ensures that data is transferred when nodes need it. Thereby, it avoids full buffers and interruption at any other time. However, software synchronization is not able to completely achieve these objectives, because its flits may still interrupt nodes or fill buffers. Therefore, we propose a lightweight hardware synchronization. It requires only small architectural changes as it comprises only very small components and it scales well. For controlling our hardware supported synchronization, we add two new assembler instructions. Furthermore, we show the difference in the software development process and evaluate the impact on the execution time of global communication operations and required receive buffer slots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
At distributed systems, there is plenty of buffer space because the main memory and swap space (hard disk) may be employed.
- 2.
When implementing ready synchronization in software, a particular payload is defined to represent ready flits. In the hardware implementation, a payload is not possible because ready flits do not reach the processing element of a node.
- 3.
For example, when a send operation takes 100 cycles and a receive operation takes 105 cycles, it takes 20 sends (2000 cycles) to permanently occupy one more buffer slot.
- 4.
At some architectures, this might be solved with a header flit. However, not all architectures support this approach, see for example the RC/MC architecture [10].
- 5.
They can be downloaded at www.github.com/unia-sik/rcmc.
- 6.
A broadcast operation with one flit would result in numbers similar to the Barrier and All-to-All broadcast. Therefore, we took a larger broadcast to give an idea about what happens when lots of data is transmitted.
- 7.
Altera uses the term Logic Element for their elementary logic block, basically a lookup table with 4 inputs and 1 output (4-LUT).
References
Agarwal, A., Iskander, C., Shankar, R.: Survey of network on chip (NoC) architectures & contributions. J. Eng. Comput. Archit. 3(1), 21–27 (2009)
Bjerregaard, T., Mahadevan, S.: A survey of research and practices of network-on-chip. ACM Comput. Surv. (CSUR) 38(1), 1–51 (2006)
Borkar, S.: Future of interconnect fabric: a contrarian view. In: Workshop on System Level Interconnect Prediction, SLIP 2010, pp. 1–2 (2010)
Chrysos, G.: Intel® Xeon Phi coprocessor (codename knights corner). In: Hot Chips 24 Symposium (HCS), 2012 IEEE, pp. 1–31. IEEE (2012)
Coenen, M., Murali, S., Ruadulescu, A., Goossens, K., De Micheli, G.: A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ ISSS 2006, pp. 130–135. IEEE (2006)
Goossens, K., Dielissen, J., Radulescu, A.: Æthereal network on chip: concepts, architectures, and implementations. IEEE Design Test Comput. 22(5), 414–421 (2005)
Kung, H.T., Morris, R.: Credit-based flow control for ATM networks. IEEE Netw. 9(2), 40–48 (1995)
Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach. Pearson, London (2012)
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.1. High Performance Computing Center Stuttgart (HLRS) (2015). http://mpi-forum.org/docs/mpi-3.1/mpi31-report-book.pdf
Mische, J., Frieb, M., Stegmeier, A., Ungerer, T.: Reduced complexity many-core: timing predictability due to message-passing. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds.) ARCS 2017. LNCS, vol. 10172, pp. 139–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54999-6_11
Mische, J., Ungerer, T.: Low power flitwise routing in an unidirectional torus with minimal buffering. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 63–68. ACM, New York (2012)
Mische, J., Ungerer, T.: Guaranteed service independent of the task placement in NoCs with torus topology. In: Proceedings of the 22nd International Conference on Real-Time Networks and Systems, RTNS 2014, pp. 151–160. ACM, New York (2014)
Rattner, J.: An experimental many-core processor from Intel Labs. Presentation (2010). http://download.intel.com/pressroom/pdf/rockcreek/SCC_Announcement_JustinRattner.pdf
Raynal, M., Helary, J.M.: Synchronization and Control of Distributed Systems and Programs. Wiley Series in Parallel Computing. Wiley, Chichester (1990). (Trans: Synchronisation et contrôle des systèmes et des programmes réparties, Paris, Eyrolles). http://cds.cern.ch/record/223733
Tanenbaum, A.S., Van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Upper Saddle River (2007)
Tanenbaum, A.S., Wetherall, D.J.: Computer Networks. Pearson, London (2010)
Vangal, S.R., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Singh, A., Jacob, T., Jain, S., Erraguntla, V., Roberts, C., Hoskote, Y., Borkar, N., Borkar, S.: An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43(1), 29–41 (2008)
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown III, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007)
Acknowledgement
The authors thank Ingo Sewing for his efforts implementing our lightweight hardware synchronization in the RC/MC architecture.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Frieb, M., Stegmeier, A., Mische, J., Ungerer, T. (2018). Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2018. ARCS 2018. Lecture Notes in Computer Science(), vol 10793. Springer, Cham. https://doi.org/10.1007/978-3-319-77610-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-77610-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77609-5
Online ISBN: 978-3-319-77610-1
eBook Packages: Computer ScienceComputer Science (R0)