Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips

Frieb, Martin; Stegmeier, Alexander; Mische, Jörg; Ungerer, Theo

doi:10.1007/978-3-319-77610-1_9

Martin Frieb¹⁸,
Alexander Stegmeier¹⁸,
Jörg Mische¹⁸ &
…
Theo Ungerer¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10793))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1673 Accesses

Abstract

Buffer overflows are a serious problem when running message-passing programs on network-on-chip based many-core processors. A simple synchronization mechanism ensures that data is transferred when nodes need it. Thereby, it avoids full buffers and interruption at any other time. However, software synchronization is not able to completely achieve these objectives, because its flits may still interrupt nodes or fill buffers. Therefore, we propose a lightweight hardware synchronization. It requires only small architectural changes as it comprises only very small components and it scales well. For controlling our hardware supported synchronization, we add two new assembler instructions. Furthermore, we show the difference in the software development process and evaluate the impact on the execution time of global communication operations and required receive buffer slots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

NoC-based hardware software co-design framework for dataflow thread management

Article Open access 11 May 2023

Runtime buffer management to improve the performance in irregular Network-on-Chip architecture

Article 25 June 2015

Survey on Real-Time Network-on-Chip Architectures

Notes

1.
At distributed systems, there is plenty of buffer space because the main memory and swap space (hard disk) may be employed.
2.
When implementing ready synchronization in software, a particular payload is defined to represent ready flits. In the hardware implementation, a payload is not possible because ready flits do not reach the processing element of a node.
3.
For example, when a send operation takes 100 cycles and a receive operation takes 105 cycles, it takes 20 sends (2000 cycles) to permanently occupy one more buffer slot.
4.
At some architectures, this might be solved with a header flit. However, not all architectures support this approach, see for example the RC/MC architecture [10].
5.
They can be downloaded at www.github.com/unia-sik/rcmc.
6.
A broadcast operation with one flit would result in numbers similar to the Barrier and All-to-All broadcast. Therefore, we took a larger broadcast to give an idea about what happens when lots of data is transmitted.
7.
Altera uses the term Logic Element for their elementary logic block, basically a lookup table with 4 inputs and 1 output (4-LUT).

References

Agarwal, A., Iskander, C., Shankar, R.: Survey of network on chip (NoC) architectures & contributions. J. Eng. Comput. Archit. 3(1), 21–27 (2009)
Google Scholar
Bjerregaard, T., Mahadevan, S.: A survey of research and practices of network-on-chip. ACM Comput. Surv. (CSUR) 38(1), 1–51 (2006)
Article Google Scholar
Borkar, S.: Future of interconnect fabric: a contrarian view. In: Workshop on System Level Interconnect Prediction, SLIP 2010, pp. 1–2 (2010)
Google Scholar
Chrysos, G.: Intel® Xeon Phi coprocessor (codename knights corner). In: Hot Chips 24 Symposium (HCS), 2012 IEEE, pp. 1–31. IEEE (2012)
Google Scholar
Coenen, M., Murali, S., Ruadulescu, A., Goossens, K., De Micheli, G.: A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ ISSS 2006, pp. 130–135. IEEE (2006)
Google Scholar
Goossens, K., Dielissen, J., Radulescu, A.: Æthereal network on chip: concepts, architectures, and implementations. IEEE Design Test Comput. 22(5), 414–421 (2005)
Article Google Scholar
Kung, H.T., Morris, R.: Credit-based flow control for ATM networks. IEEE Netw. 9(2), 40–48 (1995)
Article Google Scholar
Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach. Pearson, London (2012)
Google Scholar
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.1. High Performance Computing Center Stuttgart (HLRS) (2015). http://mpi-forum.org/docs/mpi-3.1/mpi31-report-book.pdf
Mische, J., Frieb, M., Stegmeier, A., Ungerer, T.: Reduced complexity many-core: timing predictability due to message-passing. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds.) ARCS 2017. LNCS, vol. 10172, pp. 139–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54999-6_11
Chapter Google Scholar
Mische, J., Ungerer, T.: Low power flitwise routing in an unidirectional torus with minimal buffering. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 63–68. ACM, New York (2012)
Google Scholar
Mische, J., Ungerer, T.: Guaranteed service independent of the task placement in NoCs with torus topology. In: Proceedings of the 22nd International Conference on Real-Time Networks and Systems, RTNS 2014, pp. 151–160. ACM, New York (2014)
Google Scholar
Rattner, J.: An experimental many-core processor from Intel Labs. Presentation (2010). http://download.intel.com/pressroom/pdf/rockcreek/SCC_Announcement_JustinRattner.pdf
Raynal, M., Helary, J.M.: Synchronization and Control of Distributed Systems and Programs. Wiley Series in Parallel Computing. Wiley, Chichester (1990). (Trans: Synchronisation et contrôle des systèmes et des programmes réparties, Paris, Eyrolles). http://cds.cern.ch/record/223733
Tanenbaum, A.S., Van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Upper Saddle River (2007)
MATH Google Scholar
Tanenbaum, A.S., Wetherall, D.J.: Computer Networks. Pearson, London (2010)
Google Scholar
Vangal, S.R., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Singh, A., Jacob, T., Jain, S., Erraguntla, V., Roberts, C., Hoskote, Y., Borkar, N., Borkar, S.: An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43(1), 29–41 (2008)
Article Google Scholar
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown III, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007)
Article Google Scholar

Download references

Acknowledgement

The authors thank Ingo Sewing for his efforts implementing our lightweight hardware synchronization in the RC/MC architecture.

Author information

Authors and Affiliations

Institute of Computer Science, University of Augsburg, 86159, Augsburg, Germany
Martin Frieb, Alexander Stegmeier, Jörg Mische & Theo Ungerer

Authors

Martin Frieb
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Stegmeier
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Mische
View author publications
You can also search for this author in PubMed Google Scholar
Theo Ungerer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Frieb .

Editor information

Editors and Affiliations

Chair for Chip Design for Embedded Computing, Technische Universität Braunschweig, Braunschweig, Germany
Mladen Berekovic
Chair for Chip Design for Embedded Computing, Technische Universität Braunschweig, Braunschweig, Germany
Rainer Buchty
Institute of Computer Engineering, Universität zu Lübeck, Lübeck, Germany
Heiko Hamann
School of Computer Science, The University of Manchester, Manchester, United Kingdom
Dirk Koch
Institute for Information Technology and Communications, Otto-von-Guericke Universität Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frieb, M., Stegmeier, A., Mische, J., Ungerer, T. (2018). Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2018. ARCS 2018. Lecture Notes in Computer Science(), vol 10793. Springer, Cham. https://doi.org/10.1007/978-3-319-77610-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-77610-1_9
Published: 08 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77609-5
Online ISBN: 978-3-319-77610-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

NoC-based hardware software co-design framework for dataflow thread management

Runtime buffer management to improve the performance in irregular Network-on-Chip architecture

Survey on Real-Time Network-on-Chip Architectures

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

NoC-based hardware software co-design framework for dataflow thread management

Runtime buffer management to improve the performance in irregular Network-on-Chip architecture

Survey on Real-Time Network-on-Chip Architectures

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation