Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Formal specification and verification of networks has become a reality in recent years with the emergence of network-specific programming languages and property-checking tools. Programming languages like Frenetic [11], Pyretic [35], Maple [51], FlowLog [37], and others are enabling programmers to specify the intended behavior of a network in terms of high-level constructs such as Boolean predicates and functions on packets. Verification tools like Header Space Analysis [21], VeriFlow [22], and NetKAT [12] are making it possible to check properties such as connectivity, loop freedom, and traffic isolation automatically.

However, despite many notable advances, these frameworks all have a fundamental limitation: they model network behavior in terms of deterministic packet-processing functions. This approach works well enough in settings where the network functionality is simple, or where the properties of interest only concern the forwarding paths used to carry traffic. But it does not provide satisfactory accounts of more complicated situations that often arise in practice:

  • Congestion: the network operator wishes to calculate the expected degree of congestion on each link given a model of the demands for traffic.

  • Failure: the network operator wishes to calculate the probability that packets will be delivered to their destination, given that devices and links fail with a certain probability.

  • Randomized Forwarding: the network operator wishes to use randomized routing schemes such as equal cost multi-path routing (ECMP) or Valiant load balancing (VLB) to balance load across multiple paths.

Overall, there is a mismatch between the realities of modern networks and the capabilities of existing reasoning frameworks. This paper presents a new framework, Probabilistic NetKAT (ProbNetKAT), that is designed to bridge this gap.

Background. As its name suggests, ProbNetKAT is based on NetKAT, a network programming language developed in prior work [1, 12, 47]. NetKAT is an extension of Kleene algebra with tests (KAT), an algebraic system for propositional verification of imperative programs that has been extensively studied for nearly two decades [25]. At the level of syntax, NetKAT offers a rich collection of intuitive constructs including: conditional tests; primitives for modifying packet headers and encoding topologies; and sequential, parallel, and iteration operators. The semantics of the language can be understood in terms of a denotational model based on functions from packet histories to sets of packet histories (where a history records the path through the network taken by a packet) or equivalently, using an equational deductive system that is sound and complete with respect to the denotational semantics. NetKAT has a PSPACE decision procedure that exploits the coalgebraic structure of the language and can solve many verification problems automatically [12]. Several practical applications of NetKAT have been developed, including algorithms for testing reachability and non-interference, a syntactic correctness proof for a compiler that translates programs to hardware instructions for SDN switches, and a compiler that handles source programs written against virtual topologies [47].

Challenges. Probabilistic NetKAT enriches the semantics of NetKAT so that programs denote functions that yield probability distributions on sets of packet histories. Although this change is simple at the surface, it enables adding powerful primitives such as random choice, which can be used to handle the scenarios above involving congestion, failure, and randomized forwarding. At the same time, it creates significant challenges because the semantics must be extended to handle probability distributions while preserving the intuitive meaning of NetKAT’s existing programming constructs. A number of important questions do not have obvious answers: Should the semantics be based on discrete or continuous distributions? How should it handle operators such as parallel composition that combine multiple distributions into a single distribution? Do suitable fixpoints exist that can be used to provide semantics for iteration?

Approach. Our development of a semantics for ProbNetKAT follows a classic approach: we first define a suitable mathematical space of objects and then identify semantic objects in this space that serve as denotations for each of the syntactic constructs in the language. Our semantics is based on Markov kernels over sets of packet histories. To a first approximation, these can be thought of as functions that produce a probability distribution on sets of packet histories, but the properties of Markov kernels ensure that important operators such as sequential composition behave as expected. The parallel composition operator is particularly interesting, since it must combine disjoint and overlapping distributions (the latter models multicast), as is the Kleene star operator, since it requires showing that fixpoints exist.

Evaluation. To evaluate our design, we prove that the probabilistic semantics of ProbNetKAT is a conservative extension of the standard NetKAT semantics. This is a crucial point of our work: the language developed in this paper is based on NetKAT, which in turn is an extension of KAT, a well-established framework for program verification. Hence, this work can be seen as the next step in the modular development of an expressive network programming language, with increasingly sophisticated set of features, based on a sound and long-standing mathematical foundation. We also develop a number of case studies that illustrate the use of the semantics on examples inspired by real-world scenarios. These case studies model congestion, failure, and randomization, as discussed above, as well as a gossip protocol that disseminates information through a network.

Contributions. Overall, the contributions of this paper are as follows:

  • We present the design of ProbNetKAT, the first language-based framework for specifying and verifying probabilistic network behavior.

  • We define a formal semantics for ProbNetKAT based on Markov kernels, prove that it conservatively extends the semantics of NetKAT, and develop a notion of approximation between programs.

  • We develop a number of case studies that illustrate the use of ProbNetKAT on real-world examples.

Outline. The rest of this paper is organized as follows: Sect. 2 introduces the basic ideas behind ProbNetKAT through an example; Sect. 3 reviews concepts from measure theory needed to define the semantics; Sect. 4 and Sect. 5 present the syntax and semantics of ProbNetKAT; Sect. 6 illustrates the semantics by proving conservativity and some natural equations; Sect. 8 discusses applications of the semantics to real-world examples. We discuss related work in Sect. 9 and conclude in Sect. 10. Proofs and further semantic details can be found in the extended version of this paper [13].

2 Overview

This section introduces ProbNetKAT using a simple example and discusses some of the key challenges in designing the language.

Preliminaries. A packet \(\pi \) is a record with fields \(x_1\) to \(x_k\) ranging over standard header fields (Ethernet and IP addresses, TCP ports, etc.) as well as special switch and port fields indicating its network location: \( \{ x_1 = n_1, \dots , x_k = n_k \} \). We write \(\pi (x)\) for value of \(\pi \)’s x field and \(\pi [n/x]\) for the packet obtained from \(\pi \) by setting the x field to n. We often abbreviate the switch field as \( sw \). A packet history \(\sigma \) is a non-empty sequence of packets \(\pi _1{:}\pi _2{:}\cdots {:}\pi _m\), listed from youngest to oldest. Operationally, only the head packet \(\pi _1\) exists in the network, but in the semantics we keep track of the packet’s history to enable precise specification of forwarding along specific paths. We write \(\pi {:}\eta \) for the history with head \(\pi \) and (possibly empty) tail \(\eta \) and H for the set of all histories.

Fig. 1.
figure 1figure 1

Barbell topology.

Example. Consider the network shown in Fig. 1 with six switches arranged into a “barbell” topology. Suppose the network operator wants to configure the switches to forward traffic on the two left-to-right paths from \(I_1\) to \(E_1\) and \(I_2\) to \(E_2\). We can implement this in ProbNetKAT as follows:

$$ \begin{aligned} p \triangleq&\left( sw = I_1 ; \mathrel {\mathtt {dup}}; sw \leftarrow C_1; \mathrel {\mathtt {dup}}; sw \leftarrow C_2; \mathrel {\mathtt {dup}}; sw \leftarrow E_1\right) \mathrel { \& }\\&\left( sw = I_2 ; \mathrel {\mathtt {dup}}; sw \leftarrow C_1; \mathrel {\mathtt {dup}}; sw \leftarrow C_2; \mathrel {\mathtt {dup}}; sw \leftarrow E_2\right) \end{aligned}$$

Because it only uses deterministic constructs, this program can be modeled as a function \(f \in 2^H \rightarrow 2^H\) on sets of packet histories: the input represents the initial set of in-flight packets while the output represents the final set of results produced by the program—the empty set is produced when the input packets are dropped (e.g., in a firewall) and a set with more elements than the input set is produced when some input packets are copied (e.g., in multicast). Our example program consists of tests (\( sw = I_1\)), which filter the set of input packets, retaining only those whose head packets satisfy the test; modifications (\( sw \leftarrow C_1\)), which change the value of one of the fields in the head packet; duplication (\(\mathrel {\mathtt {dup}}\)), which archives the current value of the head packet in the history; and sequential (; ) and parallel (\( \mathrel { \& }\)) composition operators. In this instance, the tests are mutually exclusive so the parallel composition behaves like a disjoint union operator.

Now suppose the network operator wants to calculate not just where traffic is routed but also how much traffic is sent across each link. The deterministic semantics we have seen so far calculates the trajectories that packets take through the network. Hence, for a given set of inputs, we can use the semantics to calculate the set of output histories and then count how many packets traversed each link, yielding an upper bound on congestion. But now suppose we want to predict the amount of congestion that could be induced from a model that encodes expectations about the set of possible inputs. Such models, which are often represented as traffic matrices, can be built from historical monitoring data using a variety of statistical techniques [34]. Unfortunately, even simple calculations of how much congestion is likely to occur on a given link cannot be performed using the deterministic semantics.

Returning to the example, suppose that we wish to represent the following traffic model in ProbNetKAT: in each time period, the number of packets originating at \(I_1\) is either 0, 1 or 2, with equal probability, and likewise for \(I_2\). Let \(\pi _1\) to \(\pi _4\) be distinct packets, and write \(\pi _{I_{j},i}!\) for the sequence of assignments that produces the packet \(\pi _i\) located at switch \(I_j\). We can encode the distributions at \(I_1\) and \(I_2\) using the following ProbNetKAT terms:Footnote 1

$$ \begin{aligned} d_1&\triangleq \mathtt {drop}\oplus \pi _{I_{1},1}! \oplus (\pi _{I_{1},1}! \mathrel { \& }\pi _{I_{1},2}!)\\ d_2&\triangleq \mathtt {drop}\oplus \pi _{I_{2},3}! \oplus (\pi _{I_{2},3}! \mathrel { \& }\pi _{I_{2},4}!) \end{aligned}$$

Note that because \(d_1\) and \(d_2\) involve probabilistic choice, they denote functions whose values are distributions on sets of histories rather than simply sets of histories as before. However, because they do not contain tests, they are actually constant functions, so we can treat them as distributions. For the full input distribution to the network, we combine \(d_1\) and \(d_2\) independently using parallel composition: \( d \triangleq d_1 \mathrel { \& }d_2\).

To calculate a distribution that encodes the amount of congestion on network links, we can push the input distribution d through the forwarding policy p using sequential composition: dp. This produces a distribution on sets of histories. In this example, there are nine such sets, where we write \(I_{1,1}\) to indicate that \(\pi _1\) was processed at \(I_1\), and similarly for the other switches and packets,

$$ \begin{array}{l} \{\,\}, \\ \{E_{1,1}{:}C_{2,1}{:}C_{1,1}{:}I_{1,1}\}, \\ \{E_{1,1}{:}C_{2,1}{:}C_{1,1}{:}I_{1,1},\, E_{1,2}{:}C_{2,2}{:}C_{1,2}{:}I_{1,2}\},\\ \{E_{2,3}{:}C_{2,3}{:}C_{1,3}{:}I_{2,3}\}, \\ \{E_{2,3}{:}C_{2,3}{:}C_{1,3}{:}I_{2,3},\, E_{2,4}{:}C_{2,4}{:}C_{1,4}{:}I_{2,4}\}, \\ \{E_{1,1}{:}C_{2,1}{:}C_{1,1}{:}I_{1,1},\, E_{2,3}{:}C_{2,3}{:}C_{1,3}{:}I_{2,3}\}\\ \{E_{1,1}{:}C_{2,1}{:}C_{1,1}{:}I_{1,1},\,E_{1,2}{:}C_{2,2}{:}C_{1,2}{:}I_{1,2}, \,E_{2,3}{:}C_{2,3}{:}C_{1,3}{:}I_{2,3}\}\\ \{E_{1,1}{:}C_{2,1}{:}C_{1,1}{:}I_{1,1},\,E_{2,3}{:}C_{2,3}{:}C_{1,3}{:}I_{2,3}, \,E_{2,4}{:}C_{2,4}{:}C_{1,4}{:}I_{2,4}\}\\ \{E_{1,1}{:}C_{2,1}{:}C_{1,1}{:}I_{1,1},\, E_{1,2}{:}C_{2,2}{:}C_{1,2}{:}I_{1,2},\,E_{2,3}{:}C_{2,3}{:}C_{1,3}{:}I_{2,3},\, E_{2,4}{:}C_{2,4}{:}C_{1,4}{:}I_{2,4}\}\\ \end{array} $$

and the output distribution is uniform, each set occurring with probability 1 / 9. Now suppose we wish to calculate the expected number of packets traversing the link \(\ell \) from \(C_1\) to \(C_2\). We can filter the output distribution on the set

$$\begin{aligned} b&\triangleq \{\sigma \mid C_{2,i}{:}C_{1,i} \in \sigma \text { for some } i\} \end{aligned}$$

and ask for the expected size of the result. The filtering is again done by composition, viewing b as an extended test. (In this example, all histories traverse \(\ell \), so b actually has no effect.) The expected number of packets traversing \(\ell \) is given by integration:

$$ \int _{a\in 2^H} |a|\cdot [\!\![d;p;b]\!\!](da) = 2. $$

Hence, even in a simple example where forwarding is deterministic, our semantics for ProbNetKAT is quite useful: it enables making predictions about quantitative properties such as congestion, which can be used to provision capacity, inform traffic engineering algorithms, or calculate the risk that service-level agreements may be violated. More generally, ProbNetKAT can be used to express much richer behaviors such as randomized routing, faulty links, gossip, etc., as shown by the examples presented in Sect. 8.

Challenges. We faced several challenges in formulating the semantics of ProbNetKAT in a satisfactory way. The deterministic semantics of NetKAT [1, 12] interprets programs as packet-processing functions on sets of packet histories. This is different enough from other probabilistic models in the literature that it was not obvious how to apply standard approaches. On the one hand, we wanted to extend the deterministic semantics conservatively—i.e., a ProbNetKAT program that makes no probabilistic choices should behave the same as under the deterministic NetKAT semantics. This goal was achieved (Theorem 2) using the notion of a Markov kernel, well known from previous work in probabilistic semantics [10, 24, 39]. Among other things, conservativity enables using NetKAT axioms to reason about deterministic sub-terms of ProbNetKAT programs. On the other hand, when moving to the probabilistic domain, several properties enjoyed by the deterministic version were lost, and great care was needed to formulate the new semantics correctly. Most notably, it is no longer the case that the meaning of a program on an input set of packet histories is uniquely determined by its action on individual histories (Sect. 6.4). The parallel composition operator (\( \mathrel { \& }\)), which supplants the union operator (\(+\)) of NetKAT, is no longer idempotent except when applied to deterministic programs (Lemma 1(vi)), and distributivity no longer holds in general (Lemma 4). Nevertheless, the semantics provides a powerful means of reasoning that is sufficient to derive many interesting and useful properties of networks (Sect. 8).

Perhaps the most challenging theoretical problem was the formulation of the semantics of iteration (\(^*\)). In the deterministic version, the iteration operator can be defined as a sum of powers. In ProbNetKAT, this approach does not work, as it requires that parallel composition be idempotent. Hence, we formulate the semantics of iteration in terms of an infinite stochastic process. Giving denotational meaning to this operational construction required an intricate application of the Kolmogorov extension theorem. This formulation gives a canonical solution to an appropriate fixpoint equation as desired (Theorem 1). However the solution is not unique, and it is not a least fixpoint in any natural ordering that we are aware of.

Another challenge was the observation that in the presence of both duplication (\(\mathrel {\mathtt {dup}}\)) and iteration (\(^*\)), models based on discrete distributions do not suffice, and it is necessary to base the semantics on an uncountable state space with continuous measures and sequential composition defined by integration. Most models in the literature only deal with discrete distributions, with a few notable exceptions (e.g. [10, 23, 24, 38, 39]). To see why a discrete semantics suffices in the absence of either duplication or iteration note that H is a countable set. Without iteration, we could limit our attention to distributions on finite subsets of H, which are also countable. Similarly, with iteration but without duplication, the set of histories that could be generated by a program is actually finite. Hence a discrete semantics would suffice in that case as well, even though iterative processes would not necessarily converge after finitely many steps as with deterministic processes. However, in the presence of both duplication and iteration, infinite sets and continuous measures are unavoidable (Sect. 6.3), although in specific applications, discrete distributions sometimes suffice.

3 Measure Theory Primer

This section introduces the background mathematics necessary to understand the semantics of ProbNetKAT. Because ProbNetKAT requires continuous probability distributions, we review some basic measure theory. See Halmos [17], Chung [5], or Rao [42] for a more thorough treatment.

Overview. Measures are a generalization of the concepts of length or volume of Euclidean geometry to other spaces, and form the basis of continuous probability theory. In this section, we explain what it means for a space to be measurable, show how to construct measurable spaces, and give basic operations and constructions on measurable spaces including Lebesgue integration with respect to a measure and the construction of product spaces. We also define the crucial notion of Markov kernels, the analog of Markov transition matrices for finite-state stochastic processes, which form the basis of our semantics for ProbNetKAT.

Measurable Spaces and Measurable Functions. A \(\sigma \) -algebra \(\mathcal B\) on a set S is a collection of subsets of S containing \(\emptyset \) and closed under complement and countable union (hence also closed under countable intersection). A pair \((S,\mathcal B)\) where S is a set and \(\mathcal B\) is a \(\sigma \)-algebra on S is called a measurable space. If the \(\sigma \)-algebra is obvious from the context, we simply say that S is a measurable space. For a measurable space \((S,\mathcal B)\), we say that a subset \(A \subseteq S\) is measurable if it is in \(\mathcal B\). For applications in probability theory, elements of S and \(\mathcal {B}\) are often called outcomes and events, respectively.

If \(\mathcal F\) is a collection of subsets of a set S, then we define \(\sigma (\mathcal F)\), the \(\sigma \)-algebra generated by \(\mathcal F\), to be the smallest \(\sigma \)-algebra that contains \(\mathcal F\). That is,

$$ \sigma (\mathcal F) \triangleq {\textstyle \bigcap }\{ \mathcal A \mid \mathcal F \subseteq \mathcal A\,\mathrm{and}\,\mathcal A\,\mathrm{is\,a}\,\sigma \mathrm{-algebra} \}. $$

Note that \(\sigma (\mathcal F)\) is well-defined, since the intersection is nonempty (we have that \(\mathcal F \subseteq \mathcal {P}(S)\), and \(\mathcal {P}(S)\) is a \(\sigma \)-algebra). If \((S,\mathcal B)\) is a measurable space and \(\mathcal B = \sigma (\mathcal F)\), we say that the space is generated by \(\mathcal F\).

Let \((S,\mathcal {B}_S)\) and \((T,\mathcal {B}_T)\) be measurable spaces. A function \(f: S \rightarrow T\) is measurable if the inverse image \(f^{-1}(B) = \{x \in S \mid f(x) \in B \}\) of every measurable subset \(B \subseteq T\) is a measurable subset of S. For the particular case where T is generated by the collection \(\mathcal F\), we have the following criterion for measurability: f is measurable if and only if \(f^{-1}(B)\) is measurable for every \(B \in \mathcal F\).

Measures. A measure on \((S,\mathcal {B})\) is a countably additive map \(\mu :\mathcal {B}\rightarrow \mathbb {R}\). The condition that the map be countably additive stipulates that if \(A_i\in \mathcal {B}\) is a countable set of pairwise disjoint events, then \(\mu (\bigcup _i A_i) = \sum _i \mu (A_i)\). Equivalently, if \(A_i\) is a countable chain of events, that is, if \(A_i\mathrel {\subseteq }A_j\) for \(i\le j\), then \(\lim _i \mu (A_i)\) exists and is equal to \(\mu (\bigcup _i A_i)\). A measure is a probability measure if \(\mu (A)\ge 0\) for all \(A\in \mathcal {B}\) and \(\mu (S)=1\). By convention, \(\mu (\emptyset )=0\).

For every \(a\in S\), the Dirac measure on a is the probability measure:

$$\begin{aligned} \delta _{a}(A)&= {\left\{ \begin{array}{ll} 1, &{} a\in A,\\ 0, &{} a\not \in A. \end{array}\right. } \end{aligned}$$

A measure is discrete if it is a countable weighted sum of Dirac measures.

Markov Kernels. Again let \((S,\mathcal {B}_S)\) and \((T,\mathcal {B}_T)\) be measurable spaces. A function \(P:S\times \mathcal {B}_T\rightarrow \mathbb {R}\) is called a Markov kernel (also called a Markov transition, measurable kernel, stochastic kernel, stochastic relation, etc.) if

  • for fixed \(A\in \mathcal {B}_T\), the map \(\lambda s.P(s,A):S\rightarrow \mathbb {R}\) is a measurable function on \((S,\mathcal {B}_S)\); and

  • for fixed \(s\in S\), the map \(\lambda A.P(s,A):\mathcal {B}_T\rightarrow \mathbb {R}\) is a probability measure on \((T,\mathcal {B}_T)\).

These properties allow integration on the left and right respectively.

The measurable spaces and Markov kernels form a category, the Kleisli category of the Giry monad; see [10, 38, 39]. In this context, we occasionally write \(P:(S,\mathcal {B}_S)\rightarrow (T,\mathcal {B}_T)\) or just \(P:S\rightarrow T\). Composition is given by integration: for \(P:S\rightarrow T\) and \(Q:T\rightarrow U\),

$$\begin{aligned} (P\mathrel {;}Q)(s,A)&= \int _{t\in T} P(s,dt)\cdot Q(t,A). \end{aligned}$$

Associativity of composition is essentially Fubini’s theorem (see Chung [5] or Halmos [17]). Markov kernels were first proposed as a model of probabilistic while programs by Kozen [24].

Deterministic Kernels. A Markov kernel \(P:S\rightarrow T\) is deterministic if for every \(s\in S\), there is an \(f(s)\in T\) such that

$$\begin{aligned} P(s,A)&= \delta _{f(s)}(A) = \delta _{s}(f^{-1}(A)) = \chi _{A}(f(s)). \end{aligned}$$

where \(\chi _{A}\) is the characteristic function of A. The function \(f:S\rightarrow T\) is necessarily measurable. Conversely, every measurable function gives a deterministic kernel. Thus, the and the measurable functions are in one-to-one correspondence.

4 Syntax

ProbNetKAT extends NetKAT [1, 12], which is itself based on Kleene algebra with tests (KAT) [25], a generic equational system for reasoning about partial correctness of programs.

4.1 Kleene Algebra (KA) & Kleene Algebra with Tests (KAT)

A Kleene algebra (KA) is an algebraic structure \( (K,\,+,\,\cdot ,\,^* ,\,0,\,1) \), where K is an idempotent semiring under \((+,\cdot ,0,1)\), and \({{p}^{*}} \cdot {q}\) is the least solution of the affine linear inequality \({p}\cdot {r} + q \le r\), where \(p\le q\) is shorthand for \(p + q = q\), and similarly for \({q} \cdot {{p}^{*}}\). A Kleene algebra with tests (KAT) is a two-sorted algebraic structure, \( (K,\,B,\,+,\,\cdot ,\,^*,\,0,\,1,\,\lnot ) \), where \(\lnot \) is a unary operator defined only on B, such that

  • \((K,\,+,\,\cdot ,\,^* ,\,0,\,1)\) is a Kleene algebra,

  • \((B,\,+,\,\cdot ,\,\lnot {\ },\,0,\,1)\) is a Boolean algebra, and

  • \((B,\,+,\,\cdot ,\,0,\,1)\) is a subalgebra of \((K,\,+,\,\cdot ,\,0,\,1)\).

The elements of B and K are usually called tests and actions.

The axioms of KA and KAT (both elided here) capture natural conditions such as associativity of \(\cdot \) ; see the original paper by Kozen for a complete listing [25]. Note that the KAT axioms do not hold for arbitrary ProbNetKAT programs—e.g., parallel composition is not idempotent—although they do hold for the deterministic fragment of the language.

4.2 NetKAT Syntax

NetKAT [1, 12] extends KAT with network-specific primitives for filtering, modifying, and forwarding packets, along with additional axioms for reasoning about programs built using those primitives. Formally, NetKAT is KAT with atomic tests \(x=n\) and actions \(x \leftarrow n\) and \(\mathrel {\mathtt {dup}}\). The test \(x=n\) checks whether field x of the current packet contains the value n; the assignment \(x\leftarrow n\) assigns the value n to the field x in the current packet; the action \(\mathrel {\mathtt {dup}}\) duplicates the packet in the packet history, which provides a way to keep track of the path the packet takes through the network. In NetKAT, we write ;  instead of \(\cdot \), \(\mathtt {skip}\) instead of 1, and \(\mathtt {drop}\) instead of 0, to capture their intuitive use as programming constructs. We often use juxtaposition to indicate sequential composition in examples. As an example to illustrate the main features of NetKAT, the expression

$$\begin{aligned} sw =6\mathrel {;} pt =8\mathrel {;} dst \leftarrow 10.0.1.5\mathrel {;} pt \leftarrow 5 \end{aligned}$$

encodes the command: “For all packets located at port 8 of switch 6, set the destination address to 10.0.1.5 and forward it out on port 5.”

Fig. 2.
figure 2figure 2

ProbNetKAT syntax.

4.3 ProbNetKAT Syntax

ProbNetKAT extends NetKAT with several new operations, as shown in the grammar in Fig. 2:

  • A random choice operation \(p\mathrel {\oplus _r}q\), where p and q are expressions and r is a real number in the interval [0, 1]. The expression \(p\mathrel {\oplus _r}q\) intuitively behaves according to p with probability r and q with probability \(1-r\). We frequently omit the subscript r, in which case r is understood to implicitly be 1 / 2.

  • A parallel composition operation \( p\mathrel { \& }q\), where p and q are expressions. The expression \( p\mathrel { \& }q\) intuitively says to perform both p and q, making any probabilistic choices in p and q independently, and combine the results. The operation \( \mathrel { \& }\) serves the same purpose as \(+\) in NetKAT and replaces it syntactically. We use the notation \( \mathrel { \& }\) to distinguish it from \(+\), which is used in the semantics to add measures and measurable functions as in [23, 24].

  • Extended tests t which generalize NetKAT’s tests by allowing them to operate on the entire packet history rather than simply the head packet. Formally an extended test t is just an element of \(2^H\). The extended test \(\mathtt {skip}\) is defined as the set of all packet histories and \(\mathtt {drop}\) is the empty set. An atomic test \(x = n\) is defined as the set of all histories \(\sigma \) where the x field of the head packet of \(\sigma \) is n. As we saw in Sect. 2, extended tests are often useful for reasoning probabilistically about properties such as congestion.

Although ProbNetKAT is based on KAT, it is important to keep in mind that because the semantics is probabilistic, many familiar KAT equations no longer hold. For example, idempotence of parallel composition does not hold in general. We will however prove that ProbNetKAT conservatively extends NetKAT, so it follows that the NetKAT axioms hold on the deterministic fragment.

Fig. 3.
figure 3figure 3

NetKAT semantics: primitive actions and tests (left); KAT operations (right).

5 Semantics

The standard semantics of (non-probabilistic) NetKAT interprets expressions as packet-processing functions. As defined in Fig. 2, a packet \(\pi \) is a record whose fields assign constant values n to fields x and a packet history \(\sigma \) is a nonempty sequence of packets \(\pi _1{:}\pi _2{:}\cdots {:}\pi _k\), listed from youngest to oldest. Recall that operationally, only the head packet \(\pi _1\) exists in the network, but we keep track of the history to enable precise specification of forwarding along specific paths.

5.1 NetKAT Semantics

Formally, a (non-probabilistic) NetKAT term p denotes a function

$$\begin{aligned}{}[\!\![p]\!\!]:H \rightarrow 2^H, \end{aligned}$$

where H is the set of all packet histories. Intuitively, the function \([\!\![p]\!\!]\) takes an input packet history \(\sigma \) and produces a set of output packet histories \([\!\![p]\!\!](\sigma )\).

The semantics of NetKAT is shown in Fig. 3. The test \(x=n\) drops the packet if the test is not satisfied and otherwise passes it through unaltered. Put another way, tests behave like filters. The \(\mathrel {\mathtt {dup}}\) construct duplicates the head packet \(\pi \), yielding a fresh copy that can be modified by other constructs. Hence, the \(\mathrel {\mathtt {dup}}\) construct can be used to encode paths through the network, with each occurrence of \(\mathrel {\mathtt {dup}}\) marking an intermediate hop. Note that \(+\) behaves like a disjunction operation when applied to tests and like a union operation when applied to actions. Similarly, ;  behaves like a conjunction operation when applied to tests and like a sequential composition when applied to actions. Negation is only ever applied to tests, as is enforced by the syntax of the language.

5.2 Sets of Packet Histories as a Measurable Space

To give a denotational semantics to ProbNetKAT, we must first identify a suitable space of mathematical objects. Because we want to reason about probability distributions over sets of network paths, we construct a measurable space (as defined in Sect. 3) from sets of packet histories, and then define the semantics using Markov kernels on this space. The powerset \(2^H\) of packet histories H forms a topological space with topology generated by basic clopen sets,

$$\begin{aligned} B_\tau&= \{a\in 2^H \mid \tau \in a\},\ \tau \in H. \end{aligned}$$

This space is homeomorphic to the Cantor space, the topological product of countably many copies of the discrete two-element space. In particular, it is also compact. Let \(\mathcal {B}\mathrel {\subseteq }2^{2^H}\) be the Borel sets of this topology. This is the smallest \(\sigma \)-algebra containing the sets \(B_\tau \). The measurable space \((2^H,\mathcal {B})\) with outcomes \(2^H\) and events \(\mathcal {B}\) provides a foundation for interpreting ProbNetKAT programs as Markov kernels \(2^H\rightarrow 2^H\).

5.3 The Operation \( \mathrel { \& }\)

Next, we define an operation on measures that will be needed to define the semantics of ProbNetKAT’s parallel composition operator. Parallel composition differs in some important ways from NetKAT’s union operator—intuitively, union merely combines the sets of packet histories generated by its arguments, whereas parallel composition must somehow combine measures on sets of packet histories, which is a more intricate operation. For example, while union is idempotent, parallel composition is not in general.

Operationally, the \( \mathrel { \& }\) operation on measures can be understood as follows: given measures \(\mu \) and \(\nu \), to compute \( \mu \mathrel { \& }\nu \), we sample \(\mu \) and \(\nu \) independently to get two subsets of H, then take their union. The probability of an event \(A\in \mathcal {B}\) is the probability that this union is in A. Formally, given \(\mu ,\nu \in \mathcal {M}\), let \(\mu \times \nu \) be the product measure on the product space \(2^H\times 2^H\). The union operation \(\bigcup :2^H\times 2^H\rightarrow 2^H\) is continuous and therefore measurable, so we can define:

$$ \begin{aligned} (\mu \mathrel { \& }\nu )(A)&\triangleq (\mu \times \nu )(\{(a,b) \mid a\cup b\in A\}). \end{aligned}$$
(5.1)

Intuitively, this is the probability that the union \(a\cup b\) of two independent samples taken with respect to \(\mu \) and \(\nu \) lies in A. The \( \mathrel { \& }\) operation enjoys a number of useful properties, as captured by the following lemma:

Lemma 1

  1. (i)

    \( \mathrel { \& }\) is associative and commutative.

  2. (ii)

    \( \mathrel { \& }\) is linear in both arguments.

  3. (iii)

    \( (\delta _{a} \mathrel { \& }\mu )(A) = \mu (\{b \mid a\cup b\in A\})\).

  4. (iv)

    \( \delta _{a} \mathrel { \& }\delta _{b} = \delta _{a\cup b}\).

  5. (v)

    \(\delta _{\emptyset }\) is a two-sided identity for \( \mathrel { \& }\).

  6. (vi)

    \( \mu \mathrel { \& }\mu =\mu \) iff \(\mu =\delta _{a}\) for some \(a\in 2^H\).

There is an infinitary version of \( \mathrel { \& }\) that works on countable multisets of measures, but we will not need it in this paper.

5.4 ProbNetKAT Semantics

Now we are ready to define the semantics of ProbNetKAT itself. Every ProbNetKAT term p denotes a Markov kernel

$$\begin{aligned}{}[\!\![p]\!\!]:2^H\times \mathcal {B}\rightarrow \mathbb {R}\end{aligned}$$

which can be curried as

$$\begin{aligned}{}[\!\![p]\!\!]:2^H\rightarrow \mathcal {B}\rightarrow \mathbb {R}\end{aligned}$$

Intuitively, the term p, given an input \(a\in 2^H\), produces an output according to the distribution \([\!\![p]\!\!](a)\). We can think of running the program p with input a as a probabilistic experiment, and the value \([\!\![p]\!\!](a,A)\in \mathbb {R}\) is the probability that the outcome of the experiment lies in \(A\in \mathcal {B}\). The measure \([\!\![p]\!\!](a)\) is not necessarily discrete (Sect. 6.3): its total weight is always 1, although the probability of any given singleton may be 0.

The semantics of the atomic operations is defined as follows for \(a\in 2^H\):

$$\begin{aligned}{}[\!\![x\leftarrow n]\!\!](a)&= \delta _{\{\pi [n/x]{:}\eta \, \mid \,\pi {:}\eta \in a\}}\\ [\!\![x=n]\!\!](a)&= \delta _{\{\pi {:}\eta \, \mid \,\pi {:}\eta \in a,\ \pi (x)=n\}}\\ [\!\![\mathrel {\mathtt {dup}}]\!\!](a)&= \delta _{\{\pi {:}\pi {:}\eta \, \mid \,\pi {:}\eta \in a\}}\\ [\!\![\mathtt {skip}]\!\!](a)&= \delta _{a}\\ [\!\![\mathtt {drop}]\!\!](a)&= \delta _{\emptyset }\end{aligned}$$

These are all deterministic terms, and as such, they correspond to measurable functions \(f:2^H\rightarrow 2^H\). In each of these cases, the function f is completely determined by its action on singletons, and indeed by its action on the head packet of the unique element of each of those singletons. Note that if no elements of a satisfy the test \(x=n\), the result is \(\delta _{\emptyset }\), which is the Dirac measure on the emptyset, not the constant 0 measure. The semantics of the parallel/sequential composition and random choice operators is defined as follows:

$$ \begin{aligned}{}[\!\![p\mathrel { \& }q]\!\!](a)&= [\!\![p]\!\!](a) \mathrel { \& }[\!\![q]\!\!](a)\\ [\!\![p\mathrel {;}q]\!\!](a)&= [\!\![q]\!\!]([\!\![p]\!\!](a))\\ [\!\![p\mathrel {\oplus _r}q]\!\!](a)&= r[\!\![p]\!\!](a) + (1-r)[\!\![q]\!\!](a) \end{aligned}$$

Note that composition requires us to extend \([\!\![q]\!\!]\) to allow measures as inputs. It is not surprising that this extension is needed: in NetKAT, the semantics is similarly extended to handle sets of histories. Here, the extension is done by integration, as described in Sect. 3:

$$\begin{aligned}{}[\!\![q]\!\!](\mu )&\triangleq \lambda A.\int _{a\in 2^H}[\!\![q]\!\!](a,A)\cdot \mu (da), \quad \text {for} \, \mu \, \text {a}\,\text {measure}\,\text {on}\,2^H. \end{aligned}$$

Both phenomena are consequences of sequential composition taking place in the Kleisli category of the powerset and Giry monads respectively.

5.5 Semantics of Iteration

To complete the semantics, we must define the semantics of the Kleene star operator. This turns out to be quite challenging, because the usual definition of star as a sum of powers does not work with ProbNetKAT. Instead, we define an infinite stochastic process and show that it satisfies the essential fixpoint equation that Kleene star is expected to obey (Theorem 1).

Consider the following infinite stochastic process. Starting with \(c_0\in 2^H\), create a sequence \(c_0,c_1,c_2,\ldots \) inductively. After n steps, say we have constructed \(c_{0},\ldots ,c_{n}\). Let \(c_{n+1}\) be the outcome obtained by sampling \(2^H\) according to the distribution \([\!\![p]\!\!](c_n)\). Continue this process forever to get an infinite sequence \(c_0,c_1,c_2,\ldots \in (2^H)^\omega \). Take the union of the resulting sequence \(\bigcup _n c_n\) and ask whether it is in A. The probability of this event is taken to be \([\!\![p^*]\!\!](c_0,A)\). This intuitive operational definition can be justified denotationally. However, the formal development is quite technical and depends on an application of the Kolmogorov extension theorem—see the full version of this paper [13].

The next theorem shows that the iteration operator satisfies a natural fixpoint equation. In fact, this property was the original motivation behind the operational definition we just gave. It can be used to describe the iterated processing performed by a network (Sect. 8), and to define the semantics of loops (Sect. 5.6).

Theorem 1

\( [\!\![p^*]\!\!] = [\!\![\mathtt {skip}\mathrel { \& }pp^*]\!\!]\).

Proof

To determine the probability \([\!\![p^*]\!\!](c_0,A)\), we sample \([\!\![p]\!\!](c_0)\) to get an outcome \(c_1\), then run the protocol \([\!\![p^*]\!\!]\) on \(c_1\) to obtain a set c, then ask whether \(c_0\cup c\in A\). Thus

$$ \begin{aligned}{}[\!\![p^*]\!\!](c_0,A)&= \int _{c_1}[\!\![p]\!\!](c_0,dc_1)\cdot [\!\![p^*]\!\!](c_1,\{c \mid c_0\cup c\in A\})\\&= [\!\![p^*]\!\!]([\!\![p]\!\!](c_0))(\{c \mid c_0\cup c\in A\})\\&= (\delta _{c_0}\mathrel { \& }[\!\![p^*]\!\!]([\!\![p]\!\!](c_0)))(A)\qquad \qquad \quad \text {by Lemma 1(iii)}\\&= ([\!\![\mathtt {skip}]\!\!](c_0)\mathrel { \& }[\!\![pp^*]\!\!](c_0))(A)\\&= [\!\![\mathtt {skip}\mathrel { \& }pp^*]\!\!](c_0,A). \end{aligned}$$

   \(\square \)

Note that this equation does not uniquely determine \([\!\![p^*]\!\!]\) uniquely. For example, it can be shown that a probability measure \(\mu \) is a solution of \( [\!\![\mathtt {skip}^*]\!\!](\pi )=[\!\![\mathtt {skip}\mathrel { \& }\mathtt {skip}\mathrel {;}\mathtt {skip}^*]\!\!](\pi ) \) if and only if \(\mu (B_\pi )=1\). That is, \(\pi \) appears in the output set of \([\!\![\mathtt {skip}^*]\!\!](\pi )\) with probability 1. Also note that unlike KAT and NetKAT, \([\!\![p^*]\!\!]\) is not the same as the infinite sum of powers . The latter fails to capture the sequential nature of iteration in the presence of probabilistic choice.

5.6 Extended Tests

ProbNetKAT’s extended tests generalize NetKAT’s tests, which are predicates on the head packet in a history, to predicates over the entire history. An extended test is an element \(t\in 2^H\) used as a deterministic program with semantics

$$\begin{aligned}{}[\!\![t]\!\!](a)&\triangleq \delta _{a \cap t}. \end{aligned}$$

A test \(x = n\) is a special case in which \(t = \{\pi {:}\eta \mid \pi (x)=n\}\). For every extended test there is a corresponding measure:

$$\begin{aligned}{}[\!\![t]\!\!](\mu )&= \lambda A.\mu (\{a \mid a\cap g\in A\}). \end{aligned}$$

Using this construct, we can define encodings of conditionals and loops in the standard way:

$$ \begin{aligned} \textsf {if\,b\,then\,p\,else\,q}&= bp \mathrel { \& }\bar{b}q&\textsf {while\,b\,do\,p}&= (bp)^*\bar{b}. \end{aligned}$$

Importantly, unlike treatments involving subprobability measures found in previous work [24, 38], the output here is always a probability measure, even if the program does not halt. For example, the output of the program while true do skip is the Dirac measure \(\delta _{\emptyset }\).

6 Properties

Having defined the semantics of ProbNetKAT in terms of Markov kernels, we now develop some essential properties that provide further evidence in support of our semantics.

  • We prove that ProbNetKAT is a conservative extension of NetKAT—i.e., every deterministic ProbNetKAT program behaves like the corresponding NetKAT program.

  • We present some additional properties enjoyed by ProbNetKAT programs.

  • We show that ProbNetKAT programs can generate continuous measures from discrete inputs, which shows that our use of Markov kernels is truly necessary and that no semantics based on discrete measures would suffice.

  • Finally, we present a tempting alternative “uncorrelated” semantics and show that it is inadequate for defining the semantics of ProbNetKAT.

6.1 Conservativity of the Extension

Although ProbNetKAT extends NetKAT with new probabilistic operators, the addition of these operators does not affect the behavior of purely deterministic programs. We will prove that this property is indeed true of our semantics—i.e., ProbNetKAT is a conservative extension of NetKAT.

First, we show that programs that do not use choice are deterministic:

Lemma 2

All syntactically deterministic ProbNetKAT programs p (those without an occurrence of \(\mathrel {\oplus _r}\)) are (semantically) deterministic. That is, for any \(a\in 2^H\), the distribution \([\!\![p]\!\!](a)\) is a point mass.

Next we show that the semantics agree on deterministic programs. Let \([\!\![\cdot ]\!\!]_{\mathrm N}\) and \([\!\![\cdot ]\!\!]_{\mathrm P}\) denote the semantic maps for NetKAT and ProbNetKAT respectively.

Theorem 2

For deterministic programs, ProbNetKAT semantics and NetKAT semantics agree in the following sense. For \(a\in 2^H\), define \([\!\![p]\!\!]_{\mathrm N}(a)=\bigcup _{\tau \in a}[\!\![p]\!\!]_{\mathrm N}(\tau )\). Then for any \(a,b\in 2^H\) we have \([\!\![p]\!\!]_{\mathrm N}(a)=b\) if and only if \([\!\![p]\!\!]_{\mathrm P}(a) = \delta _{b}\).

Using the fact that the NetKAT axioms are sound and complete [1, Theorems 1 and 2], we immediately obtain the following corollary:

Corollary 1

The NetKAT axioms are sound and complete for deterministic ProbNetKAT programs.

Besides providing further evidence that our probabilistic semantics captures the intended behavior, these theorems also have a pragmatic benefit: they allow us to use NetKAT to reason about deterministic terms in ProbNetKAT programs.

6.2 Further Properties

Next, we identify several natural equations that are satisfied by ProbNetKAT programs. The first two equations show that \(\mathtt {drop}\) is a left and right unit for the parallel composition operator \( \mathrel { \& }\):

$$ [\!\![p \mathrel { \& }\mathtt {drop}]\!\!] = [\!\![p]\!\!] = [\!\![\mathtt {drop}\mathrel { \& }p]\!\!] $$

This equation makes intuitive sense as deterministically dropping all inputs should have no affect when composed in parallel with any other program. The next equation states that \(\oplus _r\) is idempotent:

$$ [\!\![p \oplus _r p]\!\!] = [\!\![p]\!\!] $$

Again, this equation makes sense intuitively as randomly choosing between p and itself is the same as simply executing p. The next few equations show that parallel composition is associative and commutative:

$$ \begin{aligned}{}[\!\![(p \mathrel { \& }q) \mathrel { \& }s]\!\!]&= [\!\![p \mathrel { \& }(q \mathrel { \& }s)]\!\!]\\ [\!\![p \mathrel { \& }q]\!\!]&= [\!\![q \mathrel { \& }p]\!\!] \end{aligned}$$

The next equation shows that the arguments to random choice can be exchanged, provided the bias is complemented:

$$ [\!\![p \oplus _r q]\!\!] = [\!\![q \mathrel {\oplus _{1-r}} p]\!\!] $$

The final equation describes how to reassociate expressions involving random choice, which is not associative in general. However, by explicitly keeping track of biases, we can obtain an equation that captures a kind of associativity:

$$ [\!\![\left( p \oplus _{\frac{a}{a+b}} q\right) \mathrel {\oplus _{\frac{a+b}{a+b+c}}} s]\!\!] = [\!\![p \mathrel {\oplus _{\frac{a}{a+b+c} }} \left( q \mathrel {\oplus _{\frac{b}{b+c}}} s\right) ]\!\!] $$

Next we develop some additional properties involving deterministic programs.

Lemma 3

Let p be a deterministic program with \([\!\![p]\!\!](a) = \delta _{f(a)}\). The function \(f:2^H\rightarrow 2^H\) is measurable, and for any measure \(\mu \), we have \([\!\![p]\!\!](\mu ) = \mu \circ f^{-1}\).

As we have seen in Lemma 1(vi), \( \mathrel { \& }\) is not idempotent except in the deterministic case. Neither does sequential composition distribute over \( \mathrel { \& }\) in general. However, if the term being distributed is deterministic, then the property holds:

Lemma 4

If p is deterministic, then

$$ \begin{aligned}{}[\!\![p(q\mathrel { \& }r)]\!\!]&= [\!\![pq\mathrel { \& }pr]\!\!]&[\!\![(q\mathrel { \& }r)p]\!\!]&= [\!\![qp\mathrel { \& }rp]\!\!]. \end{aligned}$$

Neither equation holds unconditionally.

Finally, consider the program \(\mathtt {skip}\mathrel {\oplus _r}{{\mathrel {\mathtt {dup}}}}\). This program does nothing with probability r and duplicates the head packet with probability \(1-r\), where \(r\in [0,1)\). We can show that independent of r, the value of the iterated program on any single packet \(\pi \) is the point mass

$$\begin{aligned}{}[\!\![(\mathtt {skip}\oplus _r{\mathrel {\mathtt {dup}}})^*]\!\!](\pi )&= \delta _{\{\pi ^n \mid n\ge 1\}}. \end{aligned}$$
(6.1)

6.3 A Continuous Measure

Without the Kleene star operator or \(\mathrel {\mathtt {dup}}\), a ProbNetKAT program can generate only a discrete measure. This raises the question of whether it is possible to generate a continuous measure at all, even in the presence of \(^*\) and \(\mathrel {\mathtt {dup}}\). This question is important, because with only discrete measures, we would have no need for measure theory or integrals and the semantics would be significantly simpler. It turns out that the answer to this question is yes, it is possible to generate a continuous measure, therefore discrete measures do not suffice.

To see why, let \(\pi _0\) and \(\pi _1\) be distinct packets and let p be the program that changes the current packet to either \(\pi _0\) or \(\pi _1\) with equal probability. Then consider the program, \(p\mathrel {;}({\mathrel {\mathtt {dup}}}\mathrel {;}p)^*\). Operationally, it first sets the input packet to either 0 or 1 with equal probability, then repeats the following steps forever:

  1. (i)

    output the current packet,

  2. (ii)

    duplicate the current packet, and

  3. (iii)

    set the new current packet to \(\pi _0\) or \(\pi _1\) with equal probability.

This procedure produces outcomes a with exactly one packet history of every length and linearly ordered by the suffix relation. Thus each possible outcome a corresponds to a complete path in an infinite binary tree. Moreover, the probability that a history \(\tau \) is generated is \(2^{-|\tau |}\), thus any particular set is generated with probability 0, because the probability that a set is generated cannot be greater than the probability that any one of its elements is generated.

Theorem 3

Let \(\mu \) be the measure \([\!\![p\mathrel {;}({\mathrel {\mathtt {dup}}}\mathrel {;}p)^*]\!\!](0)\).

  1. (i)

    For \(\tau \in H\), the probability that \(\tau \) is a member of the output set is \(2^{-|\tau |}\).

  2. (ii)

    Two packet histories of the same length are generated with probability 0.

  3. (iii)

    \(\mu (\{a\})=0\) for all \(a\in 2^H\), thus \(\mu \) is a continuous measure.

For the proof, see the full version of this paper [13].

In fact, the measure \(\mu \) is the uniform measure on the subspace of \(2^H\) consisting of all sets that contain exactly one history of each length and are linearly ordered by the suffix relation. This subspace is homeomorphic to the Cantor space.

6.4 Uncorrelated Semantics

It is tempting to consider a weaker uncorrelated semantics

$$\begin{aligned}{}[p]:2^H\rightarrow [0,1]^H \end{aligned}$$

in which \([p](a)(\tau )\) gives the probability that \(\tau \) is contained in the output set on input a. Indeed, this semantics can be obtained from the standard ProbNetKAT semantics as follows:

$$\begin{aligned}{}[p](a)(\tau )&\triangleq [\!\![p]\!\!](a)(B_\tau ). \end{aligned}$$

However, although it is simpler in that it does not require continuous measures, one loses correlation between packets. Worse, it is not compositional, as the following example shows. Let \(\pi _0\), \(\pi _1\) be two packets and consider the programs \(\pi _0!\oplus \pi _1!\) and \( (\pi _0!\mathrel { \& }\pi _1!)\oplus \mathtt {drop}\), where \(\pi !\) is the program that sets the current packet to \(\pi \). Both programs have the same uncorrelated meaning:

$$ \begin{aligned}{}[\pi _0!\oplus \pi _1!](a)(\pi )&= [(\pi _0!\mathrel { \& }\pi _1!)\oplus \mathtt {drop}](a)(\pi ) = \textstyle \frac{1}{2} \end{aligned}$$

for \(\pi \in \{\pi _0,\pi _1\}\) and \(a\ne \emptyset \) and 0 otherwise. But their standard meanings differ:

$$ \begin{aligned}&[\!\![\pi _0!\oplus \pi _1!]\!\!](a) = \textstyle \frac{1}{2}\delta _{\{\pi _0\}}+\frac{1}{2}\delta _{\{\pi _1\}}\\&[\!\![(\pi _0!\mathrel { \& }\pi _1!)\oplus \mathtt {drop}]\!\!](a) = \textstyle \frac{1}{2}\delta _{\{\pi _0,\pi _1\}}+\frac{1}{2}\delta _{\emptyset }, \end{aligned}$$

Moreover, composing on the right with \(\pi _0!\) yields \(\delta _{\{\pi _0\}}\) and \(\frac{1}{2}\delta _{\{\pi _0\}}+\frac{1}{2}\delta _{\emptyset }\), respectively, which have different uncorrelated meanings as well. Thus we have no choice but to reject the uncorrelated semantics as a viable alternative.

7 Approximation

In this section we show that every program can be approximated arbitrarily closely by a loop-free program, using a suitable notion of approximation for the iterates of a loop. Intuitively, this explains why in many real-world scenarios it suffices to only keep track of finite and discrete distributions. Approximation in the context of bisimulation of Markov processes has been studied previously by a number of other authors [810, 28, 38, 39].

7.1 Weak Convergence of \(p^{(M)}\) to \(p^*\)

In Sect. 5.5, we defined \([\!\![p^*]\!\!]\) operationally in terms of an infinite process. To get \([\!\![p^*]\!\!](c_0,A)\), we compute an infinite sequence \(c_0,c_1,\ldots \) where in the \(n^{\mathrm {th}}\) step we sample \(c_n\) to get \(c_{n+1}\). Then we take the union of the \(c_n\) and ask whether it is in A. We proved that the resulting kernel exists and satisfies \( [\!\![p^*]\!\!] = [\!\![\mathtt {skip}\mathrel { \& }p\mathrel {;}p^*]\!\!]\).

Now let \(c_0,c_1,\ldots ,c_{m-1}\) be the outcome of the first m steps of this process, and let \([\!\![p^{(m)}]\!\!](c_0,A)\) be the probability that \(\bigcup _{n=0}^{m-1}c_n\in A\). This gives an approximation to \([\!\![p^*]\!\!](c_0,A)\). Formally, define

$$ \begin{aligned} p^{(0)}&= \mathtt {skip}&p^{(n+1)}&= \mathtt {skip}\mathrel { \& }p\mathrel {;}p^{(n)}. \end{aligned}$$

Note that \(p^{(n)}\) is not \(p^n\), nor is it \( p^0 \mathrel { \& }\cdots \mathrel { \& }p^n\).

The appropriate notion of convergence in this setting is weak convergence. A sequence of measures \(\mu _n\) converge weakly to a measure \(\mu \) if for all bounded continuous real-valued functions f, the expected values of f with respect to the measures \(\mu _n\) converge to the expected value of f with respect to \(\mu \). The following theorem captures the relationship between the approximations of an iterated program and the iterated program itself:

Theorem 4

The measures \([\!\![p^{(m)}]\!\!](c)\) converge weakly to \([\!\![p^*]\!\!](c)\).

See the full version of this paper for the proof [13].

7.2 Approximation by \(^*\)-Free Programs

We have observed that \(^*\)-free programs only generate finite discrete distributions on finite inputs. In this section we show that every program is weakly approximated to arbitrary precision by \(^*\)-free programs. The approximating programs are obtained by replacing each \(p^*\) with \(p^{(m)}\) for sufficiently large m.

This explains why we see only finite discrete distributions in most applications. In most cases, we start with finite sets and iterate only finitely many times. For instance, this will happen whenever there is a bound on the number of occurrences of \(\mathrel {\mathtt {dup}}\) in any string generated by the program as a regular expression. So although the formal semantics requires continuous distributions and integration, in many real-world scenarios we only need finite and discrete distributions.

Theorem 5

For every ProbNetKAT program p, there is a sequence of \(^*\)-free programs that converge weakly to p.

The proof uses Theorem 4 and the fact that all program constructors are continuous with respect to weak convergence.

Fig. 4.
figure 4figure 4

Topologies used in case studies: (a) fault tolerance, (b) load balancing, and (c) gossip protocols.

8 Applications

In this section, we demonstrate the expressiveness of ProbNetKAT’s probabilistic operators and power of its semantics by presenting three case studies drawn from scenarios that commonly arise in real-world networks. Specifically, we show how ProbNetKAT can be used to model and analyze expected delivery in the presence of failures, expected congestion with randomized routing schemes, and expected convergence with gossip protocols. To the best of our knowledge, ProbNetKAT is the first high-level network programming language that adequately handles these and other examples involving probabilistic behavior.

8.1 Fault Tolerance

Failures are a fact of life in real-world networks. Devices and links fail due to factors ranging from software and hardware bugs to interference from the environment such as loss of power or cables being severed. A recent empirical study of data center networks by Gill et al. [14] found that failures occur frequently and can cause issues ranging from degraded performance to service disruptions. Hence, it important for network operators to be able to understand the impact of failures—e.g., they may elect to use routing schemes that divide traffic over many diverse paths in order to minimize the impact of any given failure.

We can encode failures in ProbNetKAT using random choice and \(\mathtt {drop}\): the idiom \(p \oplus _d \mathtt {drop}\) encodes a program that succeeds and executes p with probability d, or fails and executes \(\mathtt {drop}\) with probability \(1-d\). Note that since \(\mathtt {drop}\) produces no packets, it accurately models a device or link that has crashed. We can then compute the probability that traffic will be delivered under an arbitrary forwarding scheme.

As a concrete example, consider the topology depicted in Fig. 4(a), with four switches connected in a diamond. Suppose that we wish to forward traffic from \(S_1\) to \(S_4\) and we know that the link between \(S_1\) and \(S_2\) fails with 10 % probability (for simplicity, in this example, we will assume that the switches and all other links are reliable). What is the probability that a packet that originates at \(S_1\) will be successfully delivered to \(S_4\), as desired?

Obviously the answer to this question depends on the configuration of the network—using different forwarding paths will lead to different outcomes. To investigate this question, we will encode the overall behavior of the network using several terms: a term p that encodes the local forwarding behavior of the switches; a term t that encodes the forwarding behavior of the network topology; and a term e that encodes the network egresses.

The standard way to model a link \(\ell \) is as the sequential composition of terms that (i) test the location (i.e., switch and port) at one end of the link; (ii) duplicate the head packet, and (iii) update the location to the other end of the link. However, because we are only concerned with end-to-end packet delivery in this example, we can safely elide the \(\mathrel {\mathtt {dup}}\) term. Hence, using the idiom discussed above, we would model a link \(\ell \) that fails with probability \(1-d\) as \(\ell \oplus _{d} \mathtt {drop}\). Hence, since there is a 10 % probability of failure of the link \(S_1 \rightarrow S_2\), we encode the topology t as follows:

$$ \begin{aligned} t \triangleq&( sw = S_1; pt = 2; (( sw \leftarrow S_2; pt \leftarrow 1) \oplus _{.9} \mathtt {drop}))\\&\mathrel { \& }( sw = S_1; pt = 3; sw \leftarrow S_3; pt \leftarrow 1)\\&\mathrel { \& }( sw = S_2; pt = 4; sw \leftarrow S_4; pt \leftarrow 2)\\&\mathrel { \& }( sw = S_3; pt = 4; sw \leftarrow S_4; pt \leftarrow 3). \end{aligned}$$

Here, we adopt the convention that each port is named according to the identifier of the switch it connects to—e.g., port 1 on switch \(S_2\) connects to switch \(S_1\).

Next, we define the local forwarding policy p that encodes the behavior on switches. Suppose that we forward traffic from \(S_1\) to \(S_4\) via \(S_2\). Then p would be defined as follows: \( p \triangleq ( sw = S_1; pt \leftarrow 2) \mathrel { \& }( sw = S_2; pt \leftarrow 4) \) Finally, the egress predicate e is simply: \( e \triangleq sw = S_4 \).

The complete network program is then \((p;t)^*;e\). That is, the network alternates between forwarding on switches and topology, iterating these steps until the packet is either dropped or exits the network.

Using our semantics for ProbNetKAT, we can evaluate this program on a packet starting at \(S_1\): unsurprisingly, we obtain a distribution in which there is a 90 % chance that the packet is delivered to \(S_4\) and a 10 % chance it is dropped.

Going a step further, we can model a more fault-tolerant forwarding scheme that divides traffic across multiple paths to reduce the impact of any single failure. The following program \(p'\) divides traffic evenly between \(S_2\) and \(S_3\):

$$ \begin{aligned} p' \triangleq&( sw = S_1; ( pt \leftarrow 2 \oplus pt \leftarrow 3)) \mathrel { \& }( sw = S_2; pt \leftarrow 4) \mathrel { \& }( sw = S_3; pt \leftarrow 4) \end{aligned}$$

As expected, evaluating this policy on a packet starting at \(S_1\) gives us a 95 % chance that the packet is delivered to \(S_4\) and only a 5 % chance that it is dropped. The positive effect with respect to failures has also been observed in previous work on randomized routing [53].

8.2 Load Balancing

In many networks, operators must balance demands for traffic while optimizing for various criteria such as minimizing the maximum amount of congestion on any given link. An attractive approach to these traffic engineering problems is to use routing schemes based on randomization: the operator computes a collection of paths that utilize the full capacity of the network and then maps incoming traffic flows onto those paths randomly. By spreading traffic over a diverse set of paths, such schemes ensure that (in expectation) the traffic will closely approximate the optimal solution, even though they only require a static set of paths in the core of the network.

Valiant load balancing (VLB) [49] is a classic randomized routing scheme that provides low expected congestion for any feasible demands in a full mesh. VLB forwards packets using a simple two-phase strategy: in the first phase, the ingress switch forwards the packet to a randomly selected neighbor, without considering the the packet’s ultimate destination; in the second phase, the neighbor forwards the packet to the egress switch that is connected to the destination.

Consider the four-node mesh topology shown in Fig. 4(b). To encode this behavior, we assume that each switch has ports named 1, 2, 3, 4, that port i on switch i connects to the outside world, and that all other ports j connect to switch j. We can write a ProbNetKAT program for this load balancing scheme by splitting it into two parts, one for each phase of routing. VLB often requires that traffic be tagged in each phase so that switches know when to forward it randomly or deterministically, but in this example, we can use topological information to distinguish the phases. Packets coming in from the outside (port i on switch i) are forwarded randomly, and packets on internal ports are forwarded deterministically.

We model the initial (random) phase with a term \(p_1\):

Here we tacitly use an n-ary version of \(\oplus \) that chooses each each summand with equal probability.

Similarly, we can model the second (deterministic) phase with a term \(p_2\):

Note that the guards \( sw = k; pt \ne k\) restrict to second-phase packets. The overall switch term p is simply \( p_1 \mathrel { \& }p_2\).

The topology term t is encoded with \(\mathrel {\mathtt {dup}}\) terms to record the paths, as described in Sect. 8.1.

The power of VLB is its ability to route nr / 2 load in a network with n switches and internal links with capacity r. In our example, \(n=4\) and r is 1 packet, so we can route 2 packets of random traffic with no expected congestion. We can model this demand with a term d that generates two packets with random origins and random destinations (writing \(\pi _{i,j,k}!\) for a sequence of assignments setting the switch to i, the port to j, and the identifier to k):

$$ \begin{aligned} d&\triangleq (\bigoplus _{k=1}^4 (\pi _{k,k,0}!) \mathrel { \& }\bigoplus _{k=1}^4 (\pi _{k,k,1}!)); (\bigoplus _{k=1}^4 dst \leftarrow k) \end{aligned}$$

The full network program to analyze is then \(d; (p;t)^*;p\). We can use similar techniques as in the congestion example from Sect. 2 to reason about congestion. We first define a random variable to extract the information we care about. Let \(X_{\mathrm {max}}\) be a random variable equal to the maximum number of packets traversing a single internal link. By using the semantics as in Sect. 2, we can calculate that the expected value of \(X_{\mathrm {max}}\) is 1 packet—i.e., there is no congestion.

8.3 Gossip Protocols

Gossip (or epidemic) protocols are randomized algorithms that are often used to efficiently disseminate information in large-scale distributed systems [7]. An attractive feature of gossip protocols and other epidemic algorithms is that they are able to rapidly converge to a consistent global state while only requiring bounded worst-case communication. Operationally, a gossip protocol proceeds in loosely synchronized rounds: in each round, every node communicates with a randomly selected peer and the nodes update their state using information shared during the exchange. For example, in a basic anti-entropy protocol, a “rumor” is injected into the system at a single node and spreads from node to node through pair-wise communication. In practice, such protocols can rapidly disseminate information in well-connected graphs with high probability.

We can use ProbNetKAT to model the convergence of gossip protocols. We introduce a single packet to model the “rumor” being gossiped by the system: when a node receives the packet, it randomly selects one of its neighbors to infect (by sending it the packet), and also sends a copy back to itself to maintain the infection. In gossip terminology, this would be characterized as a “push” protocol since information propagates from the node that initiates the communication to the recipient rather than the other way around.

Fig. 5.
figure 5figure 5

Gossip results.

We can make sure the nodes do not send out more than one infection packet per round by using a single incoming port (port 0) on each switch and exploiting ProbNetKAT’s set semantics: because infection packets are identical modulo location, multiple infection packets arriving at the same port are identified.

To simplify the ProbNetKAT program, we assume that the network topology is a hypercube, as shown in Fig. 4(c). The program for gossiping on a hypercube is highly uniform—assuming that switches are numbered in binary, we can randomly select a neighbor by flipping a single bit.

The fragment of the switch program p for switch 000 is as follows:

$$ \begin{aligned} sw = 000; (( pt \leftarrow 001 \oplus pt \leftarrow 010 \oplus pt \leftarrow 100) \mathrel { \& } pt \leftarrow 0). \end{aligned}$$

The overall forwarding policy is obtained by combining analogous fragments for the other switches using parallel composition.

Encoding the topology of the hypercube as t, we can then analyze \((p; t)^*\) and calculate the expected number of infected nodes after a given number of rounds \(X_{\mathrm {infected}}\) using the ProbNetKAT semantics. The results for the first few rounds are shown in Fig. 5. This captures the usual behavior of a push-based gossip protocol.

9 Related Work

Work related to ProbNetKAT can be divided into two categories: (i) models and semantics for probabilistic programs and (i) domain-specific frameworks for specifying and reasoning about network programs. This section summarizes the most relevant pieces of prior work in each of these categories.

9.1 Probabilistic Programming

Computational models and logics for probabilistic programming have been extensively studied for many years. Denotational and operational semantics for probabilistic while programs were first studied by Kozen [23]. Early logical systems for reasoning about probabilistic programs were proposed in a sequence of separate papers by Saheb-Djahromi, Ramshaw, and Kozen [24, 41, 44]. There are also numerous recent efforts [15, 16, 26, 28, 36]. Our semantics for ProbNetKAT builds on the foundation developed in these papers and extends it to the new domain of network programming.

Probabilistic programming in the context of artificial intelligence has also been extensively studied in recent years [2, 43]. However, the goals of this line of work are different than ours in that it focuses on Bayesian inference.

Probabilistic automata in several forms have been a popular model going back to the early work of Paz [40], as well as many other recent efforts [31, 45, 46]. Probabilistic automata are a suitable operational model for probabilistic programs and play a crucial role in the development of decision procedures for bisimulation equivalence, logics to reason about behavior, in the synthesis of probabilistic programs, and in model checking procedures [4, 8, 20, 27, 29]. In the present paper, we do not touch upon any of these issues so the connections to probabilistic automata theory are thin. However, we expect they will play an important role in our future work—see below.

Denotational models combining probability and nondeterminism have been proposed in papers by several authors [19, 32, 48, 50], and general models for labeled Markov processes, primarily based on Markov kernels, have been studied extensively [10, 38, 39]. Because ProbNetKAT does not have nondeterminism, we have not encountered the extra challenges arising in the combination of nondeterministic and probabilistic behavior.

All the above mentioned systems provide semantics and logical formalisms for specifying and reasoning about state-transition systems involving probabilistic choice. A crucial difference between our work and these efforts is in that our model is not really a state-transition model in the usual sense, but rather a packet-filtering model that filters, modifies, and forwards packets. Expressions denote functions that consume sets of packet histories as input and produce probability distributions of sets of packet histories as output. As demonstrated by our example applications, this view is appropriate for modeling the functionality of packet-switching networks. It has its own peculiarities and is different enough from standard state-based computation that previous semantic models in the literature do not immediately apply. Nevertheless, we have drawn much inspiration from the literature and exploited many similarities to provide a powerful formalism for modeling probabilistic behavior in packet-switching networks.

9.2 Network Programming

Recent years have seen an incredible growth of languages and systems for programming and reasoning about networks. Network programming languages such as Frenetic [11], Pyretic [35], Maple [51], NetKAT [1], and FlowLog [37] have introduced high-level abstractions and semantics that enable programmers to reason precisely about the behavior of networks. However, as mentioned previously, all of these language are based on deterministic packet-processing functions, and do not handle probabilistic traffic models or forwarding policies. Of all these frameworks, NetKAT is the most closely related as ProbNetKAT builds directly on its features.

In addition to programming languages, a number of network verification tools have been developed, including Header Space Analysis [21], VeriFlow [22], the NetKAT verifier [12], and Libra [52]. Similar to the network programming languages described above, these tools only model deterministic networks and verify deterministic properties.

Network calculus is a general framework for analyzing network behavior using tools from queuing theory [6]. It models the low-level behavior of network devices in significant detail, including features such as traffic arrival rates, switch propagation delays, and the behaviors of components like buffers and queues. This enables reasoning about quantitative properties such as latency, bandwidth, congestion, etc. Past work on network calculus can be divided into two branches: deterministic [30] and stochastic [18]. Like ProbNetKAT, the stochastic branch of network calculus provides tools for reasoning about the probabilistic behavior, especially in the presence of statistical multiplexing. However, network calculus is generally known to be difficult to use, since it can require the use of external facts from queuing theory to establish many desired results. In contrast, ProbNetKAT is a self-contained, language-based framework that offers general programming constructs and a complete denotational semantics.

10 Conclusion

Previous work [1, 12] has described NetKAT, a language and logic for specifying and reasoning about the behavior of packet-switching networks. In this paper we have introduced ProbNetKAT, a conservative extension of NetKAT with constructs for reasoning about the probabilistic behavior of such networks. To our knowledge, this is the first language-based framework for specifying and verifying probabilistic network behavior. We have developed a formal semantics for ProbNetKAT based on Markov kernels and shown that the extension is conservative over NetKAT. We have also determined the appropriate notion of approximation and have shown that every ProbNetKAT program is arbitrarily closely approximated by loop-free programs. Finally, we have presented several case studies that illustrate the use of ProbNetKAT on real-world examples.

Our examples have used the semantic definitions directly in the calculation of distributions, fault tolerance, load balancing, and a probabilistic gossip protocol. Although we have exploited several general properties of our system in these arguments, we have made no attempt to assemble them into a formal deductive system or decision procedure as was done previously for NetKAT [1, 12]. These questions remain topics for future investigation. We are hopeful that the coalgebraic perspective developed in [12] will be instrumental in obtaining a sound and complete axiomatization and a practical decision procedure for equivalence of ProbNetKAT expressions.

As a more practical next step, we would like to augment the existing NetKAT compiler [47] with tools for handling the probabilistic constructs of ProbNetKAT along with a formal proof of correctness. Features such as OpenFlow [33] “group tables” support for simple forms of randomization and emerging platforms such as P4 [3] offer additional flexibility. Hence, there already exist machine platforms that could serve as a compilation target for (restricted fragments of) ProbNetKAT.

Another interesting question is whether it is possible to learn ProbNetKAT programs from traces of a system, enabling active learning of running network policies. Such a capability would have many applications. For example, learning algorithms might be useful for detecting compromised nodes in a network. Alternatively, a network operator might use information from traceroute to learn a model that provides partial information about the paths from their own network to another autonomous system on the Internet.