Static timing analysis for modeling QoS in networks-on-chip

https://doi.org/10.1016/j.jpdc.2010.10.003Get rights and content

Abstract

Networks-on-chip (NoCs) are used in a growing number of SoCs and multi-core processors. Because messages compete for the NoC’s shared resources, quality of service and resource allocation are major concerns for system designers. In particular, a model for the properties of packet delivery through the network is desirable. We present a methodology for packet-level static timing analysis in NoCs. Our methodology quickly and accurately gauges the performance parameters of a virtual-channel wormhole NoC without simulation. The network model can handle any topology, link capacities, and buffer sizes. It provides per-flow delay analysis that is orders-of-magnitude faster than simulation while being significantly more accurate than prior static modeling techniques. Using a carefully derived and reduced Markov chain, the model can statically represent the dynamic network state. Usage of the model in a placement optimization problem is shown as an example application.

Introduction

Networks-on-chip (NoCs) are increasingly used instead of buses and dedicated signal wires in large-scale processors and, even more so, in modern systems-on-chip (SoCs) [5]. In NoC-based systems, data transmission takes the form of multi-packet flows routed through the NoC over multiple links and routers. Because network resources are shared, per-flow quality of service (QoS) becomes a major concern for the system architect. Packet delay and data throughput are primary metrics of QoS [9].

The purpose of this paper is to rigorously derive a delay and throughput model for packet-level static timing analysis (STA) for NoC-based SoCs. Static timing analysis in a shared network is a non-simulation-based technique to estimate the average delay of each flow in the network, given the network topology, link capacities, router architecture, and the bandwidth requirements and characteristics of all flows.

The motivation for a per-flow STA technique is to enable a range of design optimizations that can rely on accurate and fast network analysis. Methods such as module placement and resource allocation [39], [1] require a large number of iterations, and thus the evaluation of network performance within each iteration must be very efficient. Until now, an accurate and complete modeling of advanced NoCs has only been possible with detailed and time-consuming simulations. The main reason is that network resources, including links, routers, buffers, and ports, are shared between several information flows. Thus, contention can arise inducing statistical uncertainty in the delay of each packet. Detailed simulation, however, is too slow to be effective within an optimization inner loop because all internal buffers and states must be modeled on a cycle-by-cycle basis.

We present a rigorous analytical model that relies on a carefully constructed and reduced Markov chain to represent network state, including the occupancy of all buffers. Our model is inspired by industrial work-flow modeling techniques and, to the best of our knowledge, is the first that can accurately account for arbitrary network topology, link capacities, and buffering, when using wormhole routing with virtual channels. We rely on the well-developed theory of stochastic processes and show that our technique faithfully predicts network queuing delay for both synthetic and real-world SoC traffic scenarios. We present results and validate the model for the delay analysis of flows with fixed-length packets that are composed of a large number of flits. We discuss extensions to these assumptions as future work. Note that our model is valid for any interconnection network, and not just an NoC, but the type of optimizations it enables are particularly well suited for the tight resource constraints and design flow of NoCs and SoCs.

To summarize our contributions:

  • We present the first rigorous NoC model that is based on stochastic theory and show how to represent and solve for the network state using a Markov chain.

  • We show how to account for arbitrary and finite buffering, as well as support wormhole routing and virtual channels. We use network delay analysis as an illustrative example of the modeling technique

  • We validate our model for statistical traffic behavior using synthetic and real-world scenarios, and discuss why it is more complete and more accurate than prior analytical models.

  • We demonstrate that our model can serve at the core of a design optimization method by showing that it can faithfully choose between multiple placement options in a real-world SoC example, and do so while requiring orders of magnitude less time than simulation. We also show that the most advanced prior-art model fails to make the correct optimization decision.

The rest of the work is organized as follows. We start by discussing the related work in Section 3. Then, in Section 4, we establish a general analytical model for the average delay of each flow in an arbitrary NoC topology. We evaluate the delay model in Section 5 by comparing it with accurate simulation results and with previous delay models. Finally, Section 6 uses the model in a placement optimization tool and provides more insights and simulation results for the model.

Section snippets

Background and modeled network

The methodology we present in this paper is general and can be used with a wide array of NoC configurations, including arbitrary topology and buffering. In particular, we target NoCs with efficient Wormhole Routing in conjunction with Virtual Channels that use deterministic routing algorithms. This section provides a brief introduction to these techniques. Later in the paper we provide more details on other assumptions we make (Section 4). We also validate the model using a specific 4×4 mesh

Related work

Much of the prior work on analytical delay modeling in wormhole-enabled networks approximates the mean delay of packets in the entire system rather than estimating the delay of each source-destination flow separately [35], [30], [26], [24], [3]. Such gross approximations are often inadequate, and in such cases cannot be used in the NoC design process to efficiently optimize the allocation of resources.

In addition, while state-of-the-art NoC architectures multiplex multiple packets on the

Analytical model

In this section, we derive an analytical approximation of the delay of a flow in a wormhole network. The model supports any network topology and arbitrarily-set link capacities, as well as VC buffer sizes. Likewise, we place no restriction on the routing algorithm except that it be deterministic. To simplify the analysis, we assume that all packets have a fixed length, and are long enough so that the time it takes the head flit to arrive at the destination is considerably shorter than the time

Model validation

This section illustrates how to apply the model to compute network properties such as delay and throughput. We use the scenarios described in Sections 4.1.2 Sharing a single link with a single flow, 4.1.3 Sharing a single link with multiple flows, 4.1.4 Sharing multiple links and, for each case, compare the results of our model with detailed simulations and with HDM [21]. Our cycle-accurate, discrete-event, NoC simulator uses the OMNET++ framework [47]. The simulator simulates wormhole

Benchmark delay model and placement optimization

In this section we demonstrate how our analytical model can be used in the inner-loop of many optimization algorithms, such as module placement, buffer allocation, link capacity allocation, and network topology selection. These inner loops cannot solely rely on simulations, as simulations take too long to complete, making analytical models crucial for an efficient design process. Further, the correctness of the analytical models directly affects the correctness of the optimization algorithm.

Discussion and future work

Our packet level static timing analysis (STA) model is constructed based on several important assumptions, which impact its accuracy as discussed below. We assume that flow X is always active, and is thus always competing for link capacity with the interfering flows. We discuss the implications in Section 5.2 and show that the resulting error is both bounded and limited to scenarios where the interfering flows require more throughput than flow X. We plan to extend our model to reduce this

Conclusions

In this paper we introduced a packet-level static timing analysis (STA) for NoCs. We showed how it allows for a quick and precise evaluation of the performance parameters of a virtual-channel wormhole NoC without using any simulation techniques. It can handle any topology, link capacities, and buffer capacities—and unlike existing models, is able to evaluate the performance of a specific flow in a precise manner. Therefore, it enables per-flow QoS for NoCs.

Our new model allows for a per-flow

Acknowledgments

We thank the anonymous reviewers whose comments helped to significantly improve the paper. We also thank Prof. Moshe Sidi for fruitful discussions at the early stages of this research. Finally, we thank Intel corp., NVIDIA, and the European Research Council Starting Grant No. 210389 for providing equipment and funds in support of this research.

Evgeni Krimer is a Ph.D. student in Electrical and Computer Engineering at the University of Texas at Austin. Evgeni holds a B.Sc. and a M.Sc. in Electrical Engineering from the Technion, the Israel Institute of Technology. He has nearly 10 years of experience at Intel’s Microprocessor Research Lab and Mobile Groups Architecture department. His research interests include massively parallel computer architectures and performance analysis.

References (47)

  • E. Bolotin et al.

    Cost considerations in network on chip

    Integration, the VLSI Journal

    (2004)
  • E. Bolotin et al.

    QNoC: QoS architecture and design process for network on chip

    Journal of Systems Architecture

    (2004)
  • J. Draper et al.

    A comprehensive analytical model for wormhole routing in multicomputer systems

    Journal of Parallel and Distributed Computing

    (1994)
  • T. Ahonen, D. Sigüenza-Tortosa, H. Bin, J. Nurmi, Topology optimization for application-specific networks-on-chip, in:...
  • T. Altiok

    Performance Analysis of Manufacturing Systems

    (1997)
  • N. Alzeidi et al.

    A new modelling approach of wormhole-switched networks with finite buffers

    International Journal of Parallel, Emergent and Distributed Systems

    (2008)
  • M. Bakhouya et al.

    Analytical modeling and evaluation of on-chip interconnects using network calculus

  • L. Benini et al.

    Networks on chips: a new SoC paradigm

    Computer

    (2002)
  • D. Bertozzi et al.

    Xpipes: a network-on-chip architecture for gigascale systems-on-chip

    IEEE Circuits and Systems Magazine

    (2004)
  • T. Bjerregaard, J. Sparso, A router architecture for connection-oriented service guarantees in the MANGO clockless...
  • P. Brémaud

    Markov Chains

    (1999)
  • J. Buzacott

    Stochastic Models of Manufacturing Systems

    (1993)
  • B. Ciciani, M. Colajanni, C. Paolucci, An accurate model for the performance analysis of deterministic wormhole...
  • Y. Dallery et al.

    On decomposition methods for tandem queueing networks with blocking

    Operations Research

    (1993)
  • W. Dally

    Performance analysis of k-ary n-cube interconnection networks

    IEEE Transactions on Computers

    (1990)
  • W. Dally

    Virtual-channel flow control

    IEEE Transactions on Parallel and Distributed Systems

    (1992)
  • W. Dally

    Principles and Practices of Interconnection Networks

    (2004)
  • A. Diamantidis et al.

    Exact analysis of a two-workstation one-buffer flow line with parallel unreliable machines

    European Journal of Operational Research

    (2008)
  • S.B. Gershwin

    An efficient decomposition method for the approximate evaluation of tandem queues with finite storage space and blocking

    Operations Research

    (1987)
  • Z. Guz, I. Walter, E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, Efficient link capacity and QoS design for...
  • Z. Guz et al.

    Network delays and link capacities in application-specific wormhole NoCs

    Journal of VLSI Design

    (2007)
  • S. Hary et al.

    Feasibility test for real-time communication using wormhole routing

    IEE Proceedings—Computers and Digital Techniques

    (1997)
  • J. Hu et al.

    System-level buffer allocation for application-specific networks-on-chip router design

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    (2006)
  • Cited by (0)

    Evgeni Krimer is a Ph.D. student in Electrical and Computer Engineering at the University of Texas at Austin. Evgeni holds a B.Sc. and a M.Sc. in Electrical Engineering from the Technion, the Israel Institute of Technology. He has nearly 10 years of experience at Intel’s Microprocessor Research Lab and Mobile Groups Architecture department. His research interests include massively parallel computer architectures and performance analysis.

    Isaac Keslassy received the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 2000 and 2004, respectively. He is currently a faculty member in the electrical engineering department of the Technion, Haifa, Israel. His recent research interests include the design and analysis of high-performance routers and onchip networks. He is the recipient of the Yigal Alon Fellowship ATS-WD Career Development Chair, and the ERC Starting Grant.

    Avinoam Kolodny received his doctorate in microelectronics from Technion—Israel Institute of Technology in 1980. He joined Intel Corporation, where he was engaged in research and development in the areas of device physics, VLSI circuits, electronic design automation, and organizational development. He has been a member of the Faculty of Electrical Engineering at the Technion since 2000. His current research is focused primarily on interconnects in VLSI systems, at both physical and architectural levels.

    Isask’har Walter received his B.Sc., M.Sc., and Ph.D. degrees in Electrical Engineering from the Technion—Israel Institute of Technology, in 2002, 2005, and 2010, respectively. He is currently a post-doc member of the NoC research group in Technion. His research interests include network-on-chip for system on-chip and chip multi-core.

    Mattan Erez is an assistant professor of Electrical and Computer Engineering at the University of Texas at Austin. Mattan holds a B.A. in Physics B.A. (Technion, 1999) and a B.Sc. (Technion, 1999), M.S. (Stanford, 2002), and Ph.D. (Stanford, 2007) in Electrical-Engineering. His research interests include computer architecture and programming models.

    View full text