# Using Information Flow to Design an ISA that Controls Timing Channels

Drew Zagieboylo Cornell University dzag@cs.cornell.edu G. Edward Suh Cornell University suh@ece.cornell.edu Andrew C. Myers *Cornell University* andru@cs.cornell.edu

Abstract—Information-flow control (IFC) enforcing languages can provide high assurance that software does not leak information or allow an attacker to influence critical systems. IFC hardware description languages have also been used to design secure circuits that eliminate timing channels. However, there remains a gap between IFC hardware and software; these two components are built independently with no abstraction for how to compose their security guarantees. This paper presents a proposal for an instruction set architecture (ISA) that can provide the appropriate abstraction for joining hardware and software IFC mechanisms. Our ISA describes a RISC-V processor that tracks information-flow labels at run time and uses these labels to eliminate or mitigate timing channels. To make the ISA more practical, it allows constrained downgrading of information; it permits trading off security for performance; and still offers control primitives such as system calls. We prove timing-sensitive noninterference modulo downgrading and nonmalleability for programs executing our ISA. This involves novel restrictions on the mutability of labels beyond previous dynamic IFC systems. Furthermore, we define specific security conditions which correct hardware can implement to provide software-level security and sketch how such hardware may be designed and verified.

#### I. INTRODUCTION

While timing channels have been well known to the security community for decades, recent hardware-based exploits attest that these vulnerabilities remain unsolved problems. For example, the Spectre, Meltdown, and Foreshadow attacks allow unprivileged processes to learn secrets by timing memory accesses [1]–[3]. The sophisticated security mechanisms provided by these modern processors—privilege rings, memory management units, and software guard extensions [4]—are completely undermined by uncontrolled timing behaviors. Current processors are not timing-safe.

The hardware-security community has investigated how to eliminate timing channels from circuit implementations, but these are not panaceas. Hardware description languages (HDLs) such as SecVerilog [5] and Caisson [6] provide timingsensitive noninterference. They ensure that the time at which "public" state is updated does not depend on any "secret" state. While they do provide useful primitives for implementing secure processors, these languages are not sufficient for executing timing-safe software in a real-world setting. They can preclude necessary operations (such as modifying security labels at run time) and limit software's ability to specify security policies by baking those policies into the hardware. In practice, software needs the ability to make applicationlevel policy decisions while still benefiting from the timingsensitive guarantees of security-focused HDLs. On the other hand, more complex instantiations of secure processors lack proofs that their ISAs enforce a meaningful security condition. The Hyperflow processor [7], for instance, allows bounded software modification of the "context label", but no ISA-level security condition gives guidance on how safe this is.

Software attempts to eliminate timing channels have had some success but ultimately are not comprehensive, instead targeting empirically known sources of timing variation. For example, compilers for cryptographic computation [8]–[10] help to mitigate side channels but are fundamentally incomplete, since they only model well known sources of timing variation such as branching and caching. To fully remove timing channels, a new interface is needed to constrain how hardware state influences timing and which software instructions might leak information [11], [12].

The missing link between these hardware and software approaches is an Instruction Set Architecture (ISA) with an explicit abstraction for the influence of the machine state on timing. With such an ISA, strong timing-sensitive security conditions could be proved about software, relying on the guarantees made by hardware.

As a straw man, a software–hardware contract might ensure that all instructions with secret operands execute in constant time. In fact, existing techniques for securely implementing cryptography implicitly assume such a contract. However, constant time inevitably means worst-case time, in general, so such a contract has daunting implications for the performance of memory operations. We argue that this kind of contract is unnecessarily restrictive. It is not necessary that such instructions take constant time; it is only necessary that the time taken does not leak information.

This paper presents an ISA design that can be the interface connecting high-level timing-sensitive software abstractions to low-level timing-safe processor implementations. Our ISA is based on information flow control (IFC), which means our software–hardware contract is a set of IFC properties, rather than a prescriptive set of implementation behaviors such as forcing certain instructions to take constant time. Because the interface is based on IFC, it is possible to formally prove that only permitted information affects timing.

Our ISA design includes features to avoid being overly restrictive, as IFC systems often are [13]. To this end, it

includes downgrading operations that allow software to endorse untrusted inputs and to declassify secret data. We also allow software to specify its own timing security policy, which permits trading off timing-channel protection for performance. Both of these features are limited so that they cannot be abused by attackers to undermine the security guarantees of wellbehaved programs. We additionally include security primitives that are required to implement a practical operating system. These instructions are analogous to traditional *system calls*, but they are designed to prevent unexpected information leakage.

The ISA in this paper tackles these goals with novel constructs and stronger formal security assurance:

- The ISA dynamically enforces timing-sensitive nonmalleable information flow [14], while also preventing implicit flows created by checking mutable labels.
- The ISA allows software to control the level of timingchannel protection. The ISA can be used to eliminate timing channels, mitigate timing channels with bounded information leak using predictive mitigation [11], or enforce nonmalleable information flow control without timing channel protection.
- The ISA also includes novel instructions for implementing privilege changes to emulate the functionality of system calls while maintaining nonmalleability.
- The ISA is accompanied by formal, proved security guarantees for programs implemented with it.
- We also formally specify security conditions with which hardware implementations must comply to ensure security of the ISA.

The paper proceeds as follows. Section II presents background on security labels and our attacker model. Section III sketches our approach to controlling timing channels. Sections IV and V formalize the ISA and discuss its novel features in detail. In Section VI, we discuss the security conditions assumed of the hardware and the practical challenges in realizing those policies with modern HDLs. Section VII presents the security results for this ISA and brief sketches of their proofs. Section VIII uses example code to demonstrate use of the ISA. In Section IX we discuss related work and we discuss future work in Section X.

# II. BACKGROUND

Our ISA both extends the RISC-V ISA<sup>1</sup> [15] with new instructions and modifies the semantics of existing instructions. RISC-V has instructions for computing on data, moving data to and from memory, and for changing program control flow. *Architectural* state refers to any storage location that is explicitly accessible or modifiable by software, including the 32 general-purpose registers, the program counter and all memory locations. Our extension modifies all architectural state to be associated with a security label. All other hardware state is considered *microarchitectural* and affects only the performance of software but not its functional behavior.

$$\begin{split} (i,c)^{\rightarrow} &\triangleq c\\ (i,c)^{\leftarrow} &\triangleq i\\ l_1 &\sqsubseteq l_2 \Leftrightarrow (l_1^{\leftarrow} \sqsubseteq l_2^{\leftarrow}) \land (l_2^{\rightarrow} \sqsubseteq l_1^{\rightarrow})\\ l_1 &\sqcup l_2 \triangleq ((l_2^{\rightarrow} \sqcup l_2^{\rightarrow}), (l_1^{\leftarrow} \sqcap l_2^{\leftarrow}))\\ l_1 &\sqcap l_2 \triangleq ((l_2^{\rightarrow} \sqcap l_2^{\rightarrow}), (l_1^{\leftarrow} \sqcup l_2^{\leftarrow}))\\ \boxtimes (i,c) \triangleq (c,i) \end{split}$$

#### Fig. 1. Security lattice operators

The complete RISC-V ISA has many Control Status Registers (CSRs) which are considered architectural, but for brevity we omit most of them from our formalization. These CSRs should in principle also each have their own security labels.

# A. Security Labels

As in most IFC systems, our security labels form a lattice that supports a "flows to" relation  $\sqsubseteq$ , a lattice join  $\sqcup$  and a lattice meet  $\sqcap$ . We use the phrase "more restrictive" to refer to labels higher in the lattice ordering (e.g.  $a \sqsubseteq b$  means "b is at least as restrictive as a"). Figure 1 defines useful and mostly standard notation for label reference and manipulation. The label lattice is a product of two other lattices, one for integrity (trustworthiness of data) and one for confidentiality (secrecy of data), so a lattice element is a pair (i, c). For generality, we represent the two component lattices abstractly, but we restrict them to be dual lattices over the same carrier set. That is, the ordering  $\sqsubseteq$  is reversed for the integrity and confidentiality components of the label lattice. The *reflection operator*  $\boxtimes$ , used for controlled downgrading, swaps the two components of a lattice element.

An illustrative instantiation of this lattice is for the component lattice elements to represent principals. For instance, component *b* could represent both *Bob's* integrity (data written by Bob) and *Bob's* confidentiality (data readable by Bob), where *Bob* is a user of the system. Bob's data can flow to anywhere that has a label at least as confidential and no more trusted than *b*. Suppose there is a principal  $\top$  that is least in the integrity ordering (meaning that it is trusted by everyone) and greatest in the confidentiality ordering; conversely,  $\bot$  is highest in the integrity ordering (meaning that it is untrusted) and least in confidentiality. Then data labeled  $(\top, b)$  flows to the label  $(b, \top)$  because in integrity we have  $\top \sqsubseteq b$  and in confidentiality,  $b \sqsubseteq \top$ .

# B. Downgrading

Downgrading is the act of lowering the label of data in the lattice, violating the normal direction of information flow expressed by the lattice ordering. While downgrading greatly improves expressibility, it is important to constrain it, so that an attacker cannot leverage the downgrading mechanism to extract more secrets or modify more trusted state than the application developer intended. Our ISA enforces nonmalleability, a form of constrained downgrading, defined by

 $<sup>^{\</sup>rm l}{\rm Our}$  approach is not specific to RISC-V and could be adapted for use in other instruction sets.

Cecchetti et al. [14]. Nonmalleability guarantees both *robust declassification* and its dual *transparent endorsement*, which respectively constrain the downgrading of confidentiality and integrity.

We define *compromised* labels to represent exactly the set of labels that can never be safely downgraded under nonmalleability.

**Definition 1** (Compromised Labels). A label is compromised if it is not as trusted as it is secret:

# $l \not\sqsubseteq \chi(l)$

Intuitively, compromised data contains secret information but has been modified by an attacker or other low integrity source. Allowing such data to be downgraded opens up the possibility of "confused deputy" style attacks, where trusted code that executes downgrades can be tricked into downgrading arbitrary data.

## C. Attackers

We represent attackers by the maximal integrity  $i_A$  with which they can act and a minimal confidentiality  $c_A$  that they cannot observe. This is equivalent to typical attacker definitions which use a *maximal confidentiality*  $c_M$  the attacker can observe. Since we assume a finite lattice, we can translate  $c_M$  to  $c_A$  as follows:

$$L_s = \{l \mid l \not\sqsubseteq c_M\}$$
$$c_A \equiv \bigvee_{l_s \in L_s} l_s$$

 $c_A$  represents the disjunction of all labels which  $c_M$  is not allowed to read, and therefore defines the minimal confidentiality that they cannot observe.

It is convenient to summarize the attacker as a single label  $A = (i_A, c_A)$ . As depicted in Figure 2, the components  $c_A$  and  $i_A$  define upward-closed sets of secret and untrusted labels:

$$\mathcal{S} = \{l \mid c_A \sqsubseteq l\}$$
$$\mathcal{U} = \{l \mid i_A \sqsubseteq l\}$$

The sets of public  $(\mathcal{P})$  and trusted  $(\mathcal{T})$  labels are simply any labels not in  $\mathcal{S}$  or  $\mathcal{U}$ , respectively. Attackers can only read public data and can only write to untrusted data.

*a)* Fair Attacks: Similar to prior work on robust declassification [16], our security guarantees hold against *fair attacks*, where high secrecy and high integrity information are only protected from attackers that do not already know those secrets or are not already highly trusted. In this work, *fair attacks* are defined as those where A represents a compromised label:

**Definition 2** (Fair Attacker). Attacker  $A = (i_A, c_A)$  is a fair attacker if and only if A is a compromised label.

Since a given attacker may be partly trusted with respect to integrity and confidentiality, the label A is not a fixed, known label. Rather, we consider the system to be secure if it is secure against all possible fair attackers A.



Fig. 2. A 2-D slice of the combined confidentiality and integrity lattice. The red section represents all compromised labels. The dotted lines represent valid boundaries specifying a particular attacker model and dividing the lattice into quadrants. The intersection of these lines must be a *compromised* label, but need not be the same in each component lattice.

Our earlier *Bob* example can illustrate why this definition eliminates unfair attackers. In a security lattice including the orderings  $(\top, \bot) \sqsubseteq (b, b) \sqsubseteq (\bot, \top)$ , consider the attacker with *Bob's integrity* who is only allowed to read fully public data: A = (b, b).<sup>2</sup> A is not a fair attacker: it is as trusted as *Bob* (and can therefore impersonate him) but is not supposed to learn any of *Bob's* secrets. Essentially, this A would model *Bob* attacking himself. Our security condition does not prevent *Bob* from mistakenly releasing his own data to the public; it prevents untrusted attackers from doing so and from manipulating *Bob* into doing so for them.

b) Other Assumptions: We assume a strong attacker that may observe the wall-clock time at which writes to public locations occur, and not just the ordering of writes. This observational power corresponds to a colocated attackercontrolled process that can race on memory accesses and has access to wall-clock time. Defending against such a strong attacker is preferable since it makes the security assurance correspondingly stronger.

Since our ISA implements a dynamic IFC system, attackers can observe the labels of data through the success or failure of run-time checks [17]. For example, if secret (S) is used (either directly or implicitly through branching ) to label another piece of data (D) as secret, then an attacker may learn information about S when their attempt to read D fails. The ISA does not include instructions for explicitly reading labels and therefore we assume attackers cannot directly read label values.

# **III. CONTROLLING TIMING CHANNELS**

Here we present high level examples of where timing channels arise and how we approach mitigating them. Figure 3 contains RISC-V code with a simple microarchitectural timing channel: a secret-dependent load causing cache interference. In this example, s0 is a secret value; a0 and a1 are public information. In modern processors, 1w ("load word") is not

<sup>&</sup>lt;sup>2</sup>Note that this label is *not compromised* since  $(b, b) \sqsubseteq (b, b)$ 

Fig. 3. Meltdown-style timing channel via microarchitectural state

```
# 10,11,12: public-untrusted int
# h1,h2: secret-trusted int
# secret: secret-trusted boolean
10 = 11
if (secret): h1 = 11; else: h1 = 12;
10 = 1
```

Fig. 4. Untrusted inputs causing secrets to leak via timing

a constant-time operation; its duration depends primarily on the address being accessed and other microarchitectural state (notably the cache). In this case, the address depends on s0, a secret offset into array a0. Loading the data at address s1 also causes some region of the a0 array to be placed in cache. If this region happens to be close to the beginning of the array, the second 1w experiences a cache hit and executes quickly. In this way, an attacker who can observe how long it takes to load public information learns some secret information. This vulnerability reflects the core information transfer mechanism of the Meltdown attack [2].

In our ISA, software specifies a timing label, an upper bound on what information may influence instruction completion timing. If the program in Figure 3 executed with a secret timing label, then it would have the same unsatisfactory timing guarantees as current software. However, if the timing label were set to public, then only public information could influence how long any instruction took and the latency of the second lw will not reveal any information about s0. Obviously, software running at a low timing label may not benefit from all possible performance optimizations, but it does not explicitly require hardware to take worst-case time.

Figure 4 represents a different kind of timing channel, where an attacker can determine information about secrets by observing how long secret-dependent operations take. In this example, the attacker primes the cache by loading a public value, 11. Then, by observing when 10 is updated, they can infer whether or not the memory read operation in between was a cache-hit or miss. If it was a hit, this implies that the true branch was taken, since 11 was already cached.

The problem here is related to the interaction of low integrity state with high confidentiality computation; a cache that has been tainted with an attacker's state should not be allowed to influence the duration of secret operations. We incorporate this idea into our upcall instruction, which allows software to execute in a secret context for a predetermined amount of time. Critically, low integrity attackers cannot upcall their way into learning secrets nor can they influence how trusted code execute their upcalls. By considering the relationship between *integrity* and *confidentiality*, we can allow programs similar to Figure 4 to execute safely, while disallowing variants that might leak information through timing.

# IV. FORMALIZING THE ISA

# A. Definitions and Model

In this section we present an abridged semantics for our ISA. First, we introduce the model for our semantics and some notational definitions. We represent our ISA as a small-step operational semantics on configurations.

| <b>Definition 3</b> (Configurations). A processor configuration rep | p- |
|---------------------------------------------------------------------|----|
| resents the current state of the processor, encompassing bo         | th |
| architecturally visible state and microarchitectural state.         |    |

| SW registers/memory       | $M: Int \to Int$                        |
|---------------------------|-----------------------------------------|
| SW label mappings         | $L: Int \rightarrow Lbl$                |
| opaque HW state           | $\mu: Name \rightarrow Lbl$             |
| program counter and label | $pc: PC = Int \times Lbl$               |
| cycle counter and label   | $t: T = Int \times Lbl$                 |
| call stack                | $CS: List(PC \times T)$                 |
| processor configuration   | $C:\langle CS, M, L, \mu, pc, t\rangle$ |

For simplicity, we represent both registers and DRAM as a single mapping M, in which registers are located at special addresses. Addresses are drawn from *Int*, a set of finite-size integers.<sup>3</sup> Name is a set of variable names, which can refer to locations but are not directly representable as values. *Lbl* is the set of labels representable in our lattice. For clarity, we abbreviate full configurations as  $C_i$ , where subscript *i* on elements disambiguates between source configurations (e.g.  $M_1$  is the software memory of configuration  $C_1$ ). Additionally, we use  $pc_v$  to refer to the value of the pc and  $pc_l$  to refer to its label. The same convention is used for *t*.

In order to reason about the security label of a given piece of state in the processor, we define various conventions for looking up label values and converting integers to labels.

**Definition 4** (Label lookup). Both architectural state and microarchitectural state are tagged with security labels. These functions describe how to determine the value of a location's label, where  $i \in Int$ , and  $n \in Name$ .

$$\begin{array}{c|c} \textit{Interpret } i \text{ as a Lbl value} & \gamma(i) \\ \textit{Label of location } i & L(i) \\ \textit{Label of } n & \Gamma(C)(n) \end{array}$$

 $\Gamma$  is a function parameterized on processor state. This function is defined statically for a given implementation of the hardware at design time. This parameterization allows the label of any location to depend on software-specified values and/or other run-time microarchitectural state.

#### **B.** Operational Semantics

We present this ISA as a small-step operational semantics, factored into two semantics: a partial semantics specified by software instructions and an opaque hardware semantics that

<sup>&</sup>lt;sup>3</sup>The size of this range (for example, 32 or 64 bits) is architecture-specific.

TABLE I MODIFIED SEMANTICS FOR STANDARD RISC-V INSTRUCTIONS

| Insn Type | Restrictions                                                          | Behavior                                     |
|-----------|-----------------------------------------------------------------------|----------------------------------------------|
| COMPUTE   | $pc_l \sqcup L(r_{s1}) \sqcup L(r_{s2}) \sqsubseteq L(r_d)$           | $M' = M[r_d \mapsto R_{s1} \otimes R_{s2}]$  |
| LOAD      | $pc_l \sqcup L(r_{s1}) \sqcup L(M(R_{s1})) \sqsubseteq L(r_d)$        | $M' = M[r_d \mapsto M(R_{s1})]$              |
| STORE     | $pc_{l} \sqcup L(r_{s1}) \sqcup L(r_{d}) \sqsubseteq L(M(R_{s1}))$    | $M' = M[M(R_{s1}) \mapsto R_d]$              |
| BRANCH    | $L(r_{s1}) \sqcup L(r_{s2}) \sqsubseteq pc_l$                         | $pc' = (R_{s1} \otimes R_{s2})?imm : pc + 4$ |
| JUMP      | $L(r_{s1}) \sqsubseteq pc_l$                                          | $pc' = R_{s1}$                               |
| ALL_PC    | $L(M(pc_n)) \sqsubseteq pc_1 \land pc_1 \sqsubseteq \boxtimes (pc_1)$ | applies to all instructions                  |
| ALL_T     | $t_l \sqsubseteq \overleftarrow{(t_l)} \land pc_l \sqsubseteq t_l$    | applies to all instructions                  |

Fig. 5. Complete CPU operational semantics. These rules defer to semantics which describe how architectural state is modified  $(\longrightarrow_{\mathcal{A}})$  and which describe how microarchitectural state is modified  $(\longrightarrow_{\mathcal{A}})$ .

describes the behavior of microarchitectural state. Figure 5 shows the complete operational semantics for a CPU and how, in any given time step, the CPU can update architectural state (by taking a  $\longrightarrow_{\mathcal{A}}$  transition) or "stall" (from the perspective of software) by updating only microarchitectural state. While we provide the explicit semantics for  $\longrightarrow_{\mathcal{A}}$  (see Figure 7), the semantics for  $\longrightarrow_{\mu}$  are intentionally left unspecified because they are implementation-dependent. The architectural semantics  $(\longrightarrow_{\mathcal{A}})$  do not depend upon the current state of  $\mu$  since  $\mu$  should not, by definition, influence the behavior of software (beyond timing). Instead, we define a set of properties that the transition function  $\longrightarrow_{\mu}$  must satisfy. It is these properties that allows the ISA to offer security guarantees that current architectures lack.

Table I provides an abridged definition of instruction restrictions (also referred to as "label checks") and behavior for pre-existing RISC-V instructions. For abbreviation purposes, the notation  $r_x$  represents the index of a register specified by an instruction. To refer to the contents of the register, we write  $R_x$ , a shorthand for  $M(r_x)$ , the contents of the special memory location which holds that register. The symbol  $\otimes$ represents some arithmetic or relational operator appropriate to the instruction in question.

In general, the restrictions on instructions prevent state with high security labels from influencing state with low security labels. If the restrictions for a given rule cannot be met, the instruction becomes a "no-op" that increments  $pc_v$  but has no other effects. No-ops avoid leaking information through the enforcement of label checks. However, for certain errors, it is safe to jump to a special program counter, errorpc, while retaining the current  $pc_l$  and  $t_l$ . One such error is violation of the ALL\_PC rule, which can safely cause the program to jump to errorpc without breaking noninterference. The full list of these errors is specified in the technical report [18]. At this point, any error-handling program may execute (for example, to signal termination), as long as it obeys the restrictions on normal execution. To a public observer, a program that produces an error with a secret pc label therefore appears equivalent to a correctly operating program.

The COMPUTE, LOAD/STORE and BRANCH restrictions are straightforward; they ensure that instruction operands and the pc must flow to the destination register. The BRANCH restrictions prevent implicit flows.

The ALL\_PC restriction ensures that the instruction being executed is at least as trusted and public as the pc itself. This constraint prevents a trusted or public program from reading instructions from secret or untrusted memory. Additionally, ALL\_PC maintains the invariant that a program may execute only if it has an uncompromised pc. We note in Section V that keeping the pc uncompromised is required to prevent call gates from breaking nonmalleability.

The ALL\_T restriction ensures that the timing label is uncompromised and is *at least* as restrictive as the pc label. We summarize these restrictions as a validity condition:

$$ISVALID(pc_1, t_l) \triangleq (pc_1 \sqsubseteq t_l) \land (pc_1 \sqsubseteq \boxtimes (pc_l)) \land (t_l \sqsubseteq \boxtimes (t_l))$$

Intuitively, it would be difficult to implement any reasonable hardware that did not guarantee this condition. In any case where the pc label was more restrictive, the duration of the instruction would have to be independent of the instruction performed! This is obviously impractical for real systems, and the restriction allows us to mostly reason about  $pc_l$  when proving security conditions.

```
if (s):
    upgrade(ts, UNTRUSTED)
else:
    skip
tp := ts
```

Fig. 6. Leaking secrets via an integrity upgrade. Execution is successful exactly when s is false.

# C. Label Mutation

Figure 7 gives the operational semantics for instructions that modify label state or that raise or lower privilege.<sup>4</sup> Labelmutation instructions modify the labels of memory locations. It is well known that *flow-sensitive* monitors, including this ISA<sup>5</sup>, can leak information by modifying labels if mutation is not appropriately limited [17], [19]. Since our approach involves no extra static information about the executing software, we implement the *no-sensitive-upgrade* (NSU) policy [20]. The NSU policy dynamically prevents leaks by requiring that the *pc*<sub>l</sub> can flow to both the original label and the final label of the data.

However, this restriction does not eliminate all information leakage caused by label mutation. Consider the example in Figure 6. In this case, the label change is inside a secret context, which requires that the pc is secret and trusted. Register ts is secret and trusted and the upgrade makes it secret and *untrusted*. The label  $pc_l$  flows to both the original and final labels of ts, so the aforementioned rule is satisfied. Nevertheless, the final assignment (which occurs in a public context) to tp will succeed in the case where s is false and fail otherwise since ts now represents untrustworthy information.

Additionally, since label arguments themselves are labeled memory locations, we require that the label of those arguments flows to  $pc_l$ . For example, the instruction dwnlbl x3, x6 means: "Downgrade the label of register x3 to the label represented by the value stored in register x6". If the label of x6 itself were secret, using it to change the label of x3 in a public context could allow an observer to learn about the content of x6. If the label of a location whose content is used as a label does not flow to  $pc_l$ , then the instruction becomes a no-op to prevent this kind of leakage.

We introduce additional restrictions on both upgrade and downgrade rules to prevent similar kinds of information leakage; these rules differ from each other in order to be more permissive.

a) Upgrading: The predicate  $UPLBL(pc_l, l, l')$  expresses the NSU check for upgrading label l to label l' in the context  $pc_l$ :

$$UPLBL(pc_l, l, l') \triangleq (pc_l \sqsubseteq l \sqsubseteq l') \land (l' \sqsubseteq \not\boxtimes (pc_l))$$

The intuition here is that we need an upper bound for the final label to prevent it from moving to a new quadrant in the lattice. *UPLBL* deviates from the original NSU definition by adding the constraint  $l' \sqsubseteq \boxtimes (pc_l)$ . This prevents programs from creating untrustworthy information in secret contexts and vice versa. For the program in Figure 6, the uplbl instruction fails the *UPLBL* test, preventing the offending label modification. Unfortunately, this *still* leaks the value of s since the program only fails when s is true. The key insight for handling this case is that the failure happens while the *pc* is still in a high context, so measures can be taken to prevent a low context from observing the failure. We discuss this leakage in further detail below (Section IV-D).

b) Downgrading: There are two different cases to consider when downgrading label l to  $l': l' \sqsubseteq l$  and  $l' \not\sqsubseteq l$ . For the first case, the predicate  $DWNLBL(pc_l, l, l')$  expresses the existing nonmalleable information flow restrictions when downgrading label l to label l' in the context  $pc_l$ .

$$DWNLBL(pc_l, l, l') \triangleq (pc_l \sqsubseteq l') \land (l' \sqsubseteq l) \land (l \sqsubseteq \boxtimes (l))$$

The other case is the general form of downgrading, which we model as first executing a downgrade from l to  $l \sqcap l'$ , followed by an upgrade to l'. As one might expect, this essentially combines the restrictions from those other cases:

$$RELBL(pc_l, l, l') \triangleq (pc_l \sqsubseteq l \sqcap l') \land (l \sqsubseteq \boxtimes (l)) \land (l' \sqsubseteq \boxtimes (pc_l))$$

This check implies the original nonmalleability restrictions,<sup>6</sup> which means it is no more permissive. In the cases where  $l \subseteq l'$  and  $l' \subseteq l$ , the check reduces to *UPLBL* and *DWNLBL*, respectively.

# D. Raising context labels

The upcall/upret instruction pair introduces primitives for controlling timing channels while branching on secret or untrusted values. The upcall instruction allows a process to enter a more restricted context with a higher  $pc_l$  and  $t_l$ , while pushing the current  $pc_l$  and  $t_l$  to a call stack. In the new context, the program cannot write to low outputs, but its execution timing can be influenced by high hardware state. However, returning from this context reveals timing information about the duration of the subprogram. This problem can be seen in the higher-level program shown in Figure 8. The low adversary is allowed to observe the time of completion for the while block, since it can observe the timing of the writes to public\_val. However, the duration of this block depends upon secret values. This example shows a more general version of the label-checking termination channel from Figure 6.

To control timing channels, upcall instructions are given an absolute end time and an ending program counter as arguments. Once the end time is reached, the processor steps to the end  $pc_v$ . The instruction arguments are saved onto a hardware call stack along with the caller's  $pc_l$  and  $t_l$ . Intuitively, this semantics preserves noninterference because

<sup>&</sup>lt;sup>4</sup>The  $R_{sn}$  notation refers to RISC-V style register addresses; instructionsize limitations require that the real encoding differ slightly from this notation, but it is semantically equivalent.

<sup>&</sup>lt;sup>5</sup>Although this ISA is flow-sensitive, it does not have floating labels [19], and therefore labels must be explicitly changed by software instructions.

<sup>&</sup>lt;sup>6</sup>In our setting, their requirement would roughly translate to the conditions:  $l \equiv l' \sqcup \mathbb{X}(pc_l \sqcup l)$  and  $pc_l \equiv l'$ .

 $\overline{GR \vdash \langle CS, M, L, pc, t \rangle} \longrightarrow_{\mathcal{A}} \overline{\langle CS, M', L', pc', t_l \rangle}$  $\frac{l = L(r_d) \qquad l' = \gamma(R_{s1}) \qquad RELBL(pc_l, l, l') \qquad L(r_{s1}) \sqsubseteq pc_l \qquad L' = L[r_d \mapsto l']}{GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS, M, L', (pc_v + 4, pc_l), t_l \rangle} \text{ DWNLBL}$  $\frac{l = L(r_d) \qquad l' = \gamma(R_{s1}) \qquad UPLBL(pc_l, l, l') \qquad L(r_{s1}) \sqsubseteq pc_l \qquad L' = L[r_d \mapsto l']}{GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS, M, L', (pc_v + 4, pc_l), t_l \rangle} \qquad \text{uplbl}$  $\frac{pc_{l}^{\prime} = \gamma(R_{s1}) \quad t_{l}^{\prime} = \gamma(R_{s2}) \quad ISVALID(pc_{l}^{\prime}, t_{l}^{\prime}) \quad L(r_{s1}) \sqcup L(r_{s2}) \sqcup L(r_{s3}) \sqcup L(r_{d}) \sqsubseteq pc_{l} \quad pc_{l} \sqcup t_{l} \sqsubseteq pc_{l}^{\prime} \sqsubseteq t_{l}^{\prime}}{GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS^{\prime}, M, L, (pc_{v} + 4, pc_{l}^{\prime}), t_{l}^{\prime} \rangle} \quad \text{UPCALL}$  $\frac{INUPCALL}{GR \vdash \langle CS, M, L, pc, t \rangle} \xrightarrow{(endt, t'_l)} = CS[head] \quad t_v \neq endt$  $\frac{INUPCALL}{GR \vdash \langle CS, M, L, pc, t \rangle} = \frac{CS[head]}{CS' = CS[tail]} \qquad t_v = endt$   $\frac{t_v = endt}{CS' = CS(tail)} \qquad t_v = endt$   $\frac{CS' = CS(tail)}{CS' = CS(tail)} \qquad t_v = endt$   $\frac{CS' = CS(tail)}{CS' = CS(tail)} \qquad t_v = endt$  $\begin{array}{c} \emptyset = CS[head] \quad endpc = pc_v + 4 \quad CS'[head] = ((endpc, pc_l), (null, t_l)) \\ = CS \quad (pc', t'_l) = GR(R_{s1}) \quad ISVALID(pc'_l, t'_l) \quad L(r_{s1}) \sqsubseteq pc_l \quad pc'_l \sqcup t'_l \sqsubset pc_l \\ \hline GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS', M, L, pc', t'_l \rangle \end{array}$  dwncall CS'[tail] = CS $\frac{((pc'_v, pc'_l), (null, t'_l)) = CS[head] \quad pc_l \sqcup t_l \sqsubset pc'_l \sqcap t'_l \quad CS' = CS[tail]}{GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS', M, L, (pc'_v, pc'_l), t'_l \rangle} \text{ dwnret}$  $\frac{pc'_{l} = \gamma(R_{s1}) \qquad t'_{l} = \gamma(R_{s2}) \qquad pc_{l} \sqsubseteq pc'_{l} \qquad t_{l} \sqsubseteq t'_{l} \qquad ISVALID(pc'_{l}, t'_{l}) \qquad L(r_{s1}) \sqcup L(r_{s2}) \sqsubseteq pc_{l} \\ \qquad GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS, M, L, (pc_{v} + 4, pc'_{l}), t'_{l} \rangle \qquad \text{RAISELBL}$  $\frac{\neg INUPCALL}{GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_{\mathcal{A}} \langle CS, M, L, (pc_v + 4, pc_l), t_l \rangle} \text{ other_error}$  $\frac{INUPCALL}{GR \vdash \langle CS, M, L, pc, t \rangle \longrightarrow_A \langle CS, M, L, pc, t_l \rangle} \text{ UPRET\_ERROR}$ 

Fig. 7. Operational semantics for downgrading and label-mutating instructions given a call-gate registry GR.

```
public_val = 0
while (secret_1 < secret_2):
    # do some slow computation
    secret_1++
public_val = 1</pre>
```

Fig. 8. Secrets may be learned from the timing of the write to public\_val.

the subprogram cannot modify memory locations or labels in a way that changes low observations. Since the completion of the upcall is determined purely from information of at most the level  $pc_l$ , no termination channel influences subsequent program steps.

In general, this simple approach will be difficult to use in

practice because it requires programmers or compilers to know impractically cycle-accurate durations of program segments. However, it does have a use case for running untrusted functions. The upcall instruction can be used to create a low-integrity sandbox that executes until the provided timeout expires.

*a)* Using upcalls for timing mitigation: To support a more flexible programming model, we also expose a generic interface for handling returns from high contexts via an exception. When the timer completes, if the current instruction is not an upret, the configuration steps to a known exception handler

 $pc_v$ .<sup>7</sup> Furthermore, when a label check fails inside of an upcall, the program simply stalls (i.e., steps to a new configuration where no architectural state has changed). Whichever of these conditions causes the exception is recorded in a status register (implemented as a CSR), with the high label of the upcall. In Figure 7, we use the *INUPCALL* check to specify whether or not a configuration is inside of an upcall by inspecting the head of the call stack. If *INUPCALL* is true, then the error can be handled normally, otherwise it should be squashed and the program should stall.

$$INUPCALL \triangleq \\ (((endpc, pc'_l), (endt, t'_l)) = CS[head]) \\ \land (pc'_l \sqsubseteq pc_l \land t'_l \sqsubseteq t_l)$$

With this primitive, the timing mitigation algorithms described in prior work [11], [21] can be implemented, enforcing bounded leakage on information from the high context. We note that this information release is still nonmalleable; both robust declassification and transparent endorsement are maintained under these mitigation mechanisms. Importantly, our restrictions prevent attackers from exploiting mitigation to exfiltrate arbitrary data.

Checking whether or not a high context subprogram failed due to violating the label check restrictions also represents a nonmalleable information release. The data in the status register can be declassified or endorsed to reveal whether or not a label check caused the subprogram to fail. Revealing this information violates the termination sensitivity of the subprogram noninterference. Although the subprogram cannot modify any low state, information is transferred via termination.

b) Further upcall restrictions: upcall and dwncall instructions may not be executed inside an upcall. Intuitively, a dwncall (which lowers  $pc_l$ ) would allow a process to produce public outputs while still inside the upcall, leaking information about its timing and progress. As mentioned, the arguments to the upcall instruction must also themselves be labeled such that they flow to the current  $pc_l$ . Without this requirement, secret or untrusted information could still influence the duration of the subprogram.

c) Permanently raising context labels: In addition to the upcall instruction, the  $pc_l$  and  $t_l$  can be raised by simply writing to them (they are implemented as CSRs). In order to preserve noninterference, the labels can only be raised in this way. Once raised, a program can only lower its context labels by executing a dwncall instruction. This limits the possible leakages caused by the program to outputs produced by the set of trusted functions which it is allowed to call. We discuss this further in the next section.

#### E. Lowering context labels

The dwncall/dwnret instructions allow programs to call into more-public and more-trusted contexts via *call gates*.

Call gates are essentially labeled functions that have been pre-registered by a public-trusted entity. The call-gate registry is effectively a read-only function lookup table.<sup>8</sup> A call gate registration contains a pc and  $t_l$ ; using a dwncall instruction sets the current pc and  $t_l$  to the gate's values while pushing the prior values onto a call stack. These instructions provide hardware support for the privilege escalation features described in prior work on security and information flow. In particular, they closely resemble the primitives required to implement *gates* from the Multics and HiStar operating systems [22], [23]. In those systems, gates were used respectively to call known functions with higher privileges than the caller, and to implement synchronous RPC.

#### F. Exceptions and Asynchrony

We do not include exception configuration or handling in our ISA formalism or formal security proof. In this section, we describe how one could incorporate these features into our ISA without compromising its security conditions. All exceptions have a triggering condition and an *exception program counter* (epc) that points to the interrupt service routine (ISR)<sup>9</sup>.

Trigger conditions can be specific to an ISA-extension or architecture and are often defined by the hardware. The epc is programmed by software and stored in a CSR. There are additional exception masking CSRs which software can use to suppress the trigger conditions. In general, in order for an exception to fire, the security label of all trigger conditions (including masks) must flow to the current  $pc_1$ ; otherwise, an attacker process may learn that an exception fired and deduce some secret related to its cause. For arithmetic exceptions such as integer overflow or divide-by-zero, this implies that the instruction operands flow to the current  $pc_1$ ; if they don't, the exception must be suppressed. The label of the pc while the ISR is actually handling the exception must also be lower bounded by all trigger inputs and the label of the epc register itself. In this way, if an exception trigger condition is secret, its handler must be executing in a secret context and cannot produce public outputs.

We believe the primary complications involved in integrating exception handling into such an ISA are as follows. First, it is not always clear how to label exception triggers. For example, should an incoming network packet signal be labeled public or could the timing of packet arrival give an attacker information about co-resident processes? Likely, this choice should be programmable by software depending on the threat model. Secondly, depending upon how hardware state is labeled, asynchronous exceptions (such as timers and incoming network packets) may be frequently dropped or delayed. In order to account for this, the processor and ISA may need to be modified to support batched handling of exceptions along predetermined schedules within the CPU itself. Additionally, it may be difficult to limit the number

<sup>&</sup>lt;sup>7</sup>Termination behavior can be configured on a per-program basis; it is only required that the configuration is completed using only information that is low relative to the program's original  $pc_l$ .

<sup>&</sup>lt;sup>8</sup>Using rules similar to the uplbl instruction, call gate entries can also be made more secret or less trusted without violating noninterference.

<sup>&</sup>lt;sup>9</sup>This is not the same as the RISC-V epc CSR, we are paraphrasing the exception handling mechanism for clarity.

of actual hardware signals that contribute to exception trigger conditions in real implementations. For example, Van Bulck et al. [24] found that Intel SGX implementations allowed the currently executing instruction to complete before handling certain exceptions. Waiting for instruction completion means that most control signals in the CPU would influence the exception trigger conditions. It is not always possible to immediately transfer control to the ISR without waiting for some state to clear in the CPU, and thus it may be challenging to implement practical exceptions that execute in contexts that have low confidentiality or high integrity.

# V. ISA DESIGN DISCUSSION

Here we highlight some salient points of our design and compare and contrast with other language-based IFC systems.

Compromised contexts and data undermine nonmalleability: The original nonmalleability paper [14] identified restrictions on downgrading that are equivalent to our observation that compromised labels cannot be downgraded to public or trusted status. We additionally notice that executing in a compromised context can unsafely leak information through timing. Specifically, this can violate the non-occlusion principle of declassification described by Sabelfeld and Sands [25]. Consider the scenario where upcall operations implement predictive mitigation, and therefore enforce nonmalleability (rather than noninterference). Allowing a process to raise its  $pc_l$  and/or  $t_l$  to a compromised level is unsound because it implicitly allows that process to declassify arbitrary data. With our restrictions, observing the duration of this subprogram leaks only the caller's secrets and is therefore robust; otherwise any information could be implicitly declassified via this channel.

Software can control how much information it leaks through timing channels: Our ISA provides strong guarantees with respect to timing. As long as a program keeps its timing label low and executes fully low-deterministic upcalls, it leaks no information through its timing behavior. However, programs are not strictly bound by these restrictions. By explicitly exposing the  $pc_l$ ,  $t_l$  and upcall timing to software, we grant programs the ability to weaken these restrictions gracefully to suit their needs. This provides important flexibility for situations where our threat model is overly strong or when application-specific data may only require probabilistic guarantees about timing consistency.

Limitations of Our ISA: While our ISA has strong security guarantees and important security primitives, there is much room for future research. First of all, our timing label mechanism does provide a bound on which information may be implicitly leaked through timing channels. However, this is a coarse-grained approach that could potentially leak *any* information below the timing label. This behavior is unlike the dwnlbl instruction, which explicitly denotes the memory location to be downgraded. Our ISA also does not incorporate explicit timing into any instructions other than upcall. While this lack of explicitness is beneficial for remaining implementation-agnostic, it does not give guidance on how to implement secure *and efficient* hardware. Yu et al. [26] describe an ISA which focuses on this performance aspect, by exposing more microarchitectural information in their ISA. Future secure ISAs and ISA extensions must be designed with both of these goals in mind, potentially leading to new semantics or completely novel timing-aware instructions.

Finally, our work only targets the single core subset of the RISC-V ISA and does not provide guidance on how to address multicore communication and interference. This realm of interconnected computing devices communicating via shared memory and coherence networks introduces many more opportunities for timing interference and side channel communication. Investigating this problem requires a significant further effort in analyzing the semantics of existing memory models, microarchitectural coherency guarantees and how to efficiently incorporate IFC labels into these protocols.

#### VI. HARDWARE SEMANTICS AND PROPERTIES

As mentioned earlier, an actual hardware implementation of this ISA will be a circuit which not only implements the software-visible semantics but which refines the full CPU semantics. We now discuss properties of a hardware implementation that are sufficient to guarantee the ISA-level security conditions. Additionally, we discuss the implications of these properties on hardware implementations and comment on what techniques may be utilized to verifiably construct hardware with said properties.

**Property 1** (Deterministic Execution). For any configuration C, and for all  $i \in \{1, 2\}$ 

$$C \longrightarrow_{\mu} \langle \mu_i, t_{vi} \rangle \implies ((\mu_1 = \mu_2) \land (t_{v1} = t_{v2}))$$

$$\land$$

$$C \longrightarrow C_i \implies C_1 = C_2$$

The operational semantics for the transition function on microarchitectural states must be deterministic. Furthermore, we assume that the full semantics which determines when to stall the processor is also deterministic.

We believe that this property can also be relaxed to allow for sources of nondeterminism (such as changes in clock frequencies, random number generators, etc.) as long as this nondeterminism is truly generated by noise or other public/trusted factors. Defining exactly what factors are public/trusted is a complex decision related to particular threat models and is out of scope for this paper.

**Property 2** (Single-Step Machine Noninterference). *Given a* set of low labels in the security lattice,  $\mathcal{L}$ ,

$$\forall C, i \in \{1, 2\}.$$

$$(C_1 =_{\mathcal{L}} C_2) \land (C_i \longrightarrow C'_i)$$

$$\implies ((\mu'_1 =_{\mathcal{L}} \mu'_2) \land (t'_{v_1} =_{\mathcal{L}} t'_{v_2}))$$

The hardware implementation must enforce a timingsensitive noninterference condition for microarchitectural state for all transitions. With this definition, the label of t effectively bounds which hardware state may affect the timing of operations (including the decision to stall or not stall computation). The above property also implies that  $\rightarrow_{\mu}$  enforces timingsensitive noninterference on  $\mu$  and t. Note that this noninterference condition only applies for microarchitectural state, not architectural state. The architectural state may be downgraded using the downgrade instructions in our ISA.

# Property 3 (Computability of Label Lookups).

# $\exists \Gamma, \forall C, n \in \operatorname{dom}(\mu), \Gamma(C)(n) \text{ is computable}$

Property 3 has so far been an implicit assumption. The function  $\Gamma$  is parameterized on all of the configuration state; it represents a function that must be computed at run time and therefore must be implemented in the microarchitecture. In combination with Property 2, this implies that the process of looking up microarchitectural labels does not violate noninterference [27]. It also implies that, after a configuration step  $C \longrightarrow C'$ ,  $\Gamma$  determines low equivalence by evaluating labels of  $\mu$  using C', not C (we formalize low equivalence further in Section VII).

Intuitively, the above properties suggest that there is no hardware-level information flow which violates timingsensitive noninterference except for flows that are explicitly induced by software instructions. For instance, declassifying a secret memory location, *loc*, with a dwnlbl instruction can only declassify microarchitectural state that specifically represents *loc*'s data. Section VII discusses the ISA-level security properties that we can obtain, given these hardware properties, in more detail.

#### A. Implications for Hardware Implementations

**Property 1** can be easily satisfied, for the most part, as processors are typically implemented as deterministic digital circuits. While some features require a notion of nondeterminism (such as random number generators or external sensor inputs), these can be modeled as the I/O to a deterministic digital circuit. In the design, one must label and build deterministic circuitry used to process these values (e.g. a buffer containing input packets from the network) but the non-determinism of the outside system has no direct impact on the security of the processor itself. As discussed in Section IV-F, this may lead to different low-level behaviors and performance characteristics in real implementations.

Furthermore, even features with somewhat unpredictable behavior can be modeled deterministically as long as their inputs are deterministic. For example, DVFS [28] modulates clock frequency during execution and can change the wallclock time of code execution. However, if those modulation decisions are made via a digital circuit and their inputs are deterministic, we can model DVFS as software-visible architectural state and guarantee that its use does not violate our security conditions.

**Property 2** requires a processor to be designed to remove timing channels through its microarchitecture. A recent publication [7] shows that such a tagged processor with strong

control for microarchitectural timing channels and potentially reasonable overheads is feasible. Yu et al. [26] have also shown recently that it is feasible to build a modern CPU with speculation, out-of-order execution and other microarchitectural optimizations while enforcing probabilistic-noninterference [29]. These results provide evidence that it is possible to build efficient secure hardware, with the appropriate ISA abstractions.

**Property 3** suggests that processor microarchitecture needs to be designed in a way that allows the security label of microarchitectural state to be determined. This property can be achieved by either statically labeling hardware modules at design time or by adding hardware tags to track runtime labels. Recursively, these tags are also microarchitectural state and their labels must also be computable. Therefore, real implementations will use both of these techniques (static vs. dynamic labels) since  $\Gamma$  is only computable if it eventually reaches a fixed point.

Our ISA provides hardware designers with the flexibility to choose how to realize timing-sensitive noninterference. For example, in order to remove cache timing channels, a processor designer may: statically partition a cache; dedicate a cache to one security level and flush it when the security level is lowered; bypass the cache; or even introduce scratchpad memory with a fixed latency, etc.

# B. Enforcing Timing-Sensitive Noninterference in Hardware

For strong security assurance, we ideally want to formally enforce the properties needed for a secure hardware implementation. There exist several efforts to develop securityannotated Hardware Description Languages (HDL) that can provide timing-sensitive noninterference guarantees, similar to the one we specify here [5], [30], [31]. Previous studies show that these security-annotated HDLs can be used to express realistic security policies and implement complex circuits that satisfy them [6], [7], [32], [33].

The primary challenge with proving Property 2 by using secure HDLs is that these languages do not have separate notions of "architectural" and "microarchitectural" state; the entire circuit is represented as a single state machine. Phrased another way, hardware and software are concerned with different definitions of observability; in the hardware description, all state is considered observable, even though software can only directly observe architectural state. This disconnect makes proving a hardware implementation correct challenging for a few specific reasons.

First, it is impossible to prove that an implementation that supports ISA-level downgrading provides microarchitectural noninterference. Any implementation of our ISA must contain downgrades at the HDL level, which correspond to those required to implement downgrading instructions. However, the noninterference guarantees provided by these HDLs are completely obviated by including downgrades; they cannot ensure that the information being downgraded is limited only to architectural state.

A second issue with proving hardware implementations secure is the difference in label equivalence models. We

assume that an attacker cannot read the value of a secret label, but can observe the fact that the label is secret. In the hardware, any location which stores a label value must itself be labeled. Given the attacker model above, it is unclear how to write down the label of this location. If we label it as public, then the HDL will allow us to define hardware that leaks the values of secret labels to attackers. If we label it as secret, then the HDL will conservatively disallow some safe label checking operations.

We believe that these problems may be solved by applying prior techniques for verifying CPU correctness (such as Pipecheck and RTLCheck [34], [35]). Moreover, these approaches could be augmented with formal verification tools specifically designed for IFC. For instance, Nickel [36] is a framework for proving noninterference that uses application specific definitions of observational equivalence. Investigating how to utilize these approaches to prove microarchitectural noninterference while supporting software-level downgrading and notions of observability is an interesting open research question.

#### VII. ISA SECURITY PROPERTIES

This section describes some of the security properties of this ISA and their performance and usability tradeoffs.

a) Low Equivalence: We start by formalizing the low equivalence of configurations, relative to a set of low labels,  $\mathcal{L}$ . This models the ability of an observer who can only differentiate between low states; two low-equivalent configurations appear identical to a "low observer". First, we define an equivalence operator on label mappings to formalize our notion that attackers cannot observe exact label values.

**Definition 5** (Label Lookup Domain Equivalence). For an attacker inducing label sets  $\mathcal{P}$ ,  $\mathcal{S}$ ,  $\mathcal{U}$ , and  $\mathcal{T}$ 

$$L_1 \approx L_2 \iff \forall n \in \operatorname{dom}(L).$$
$$(L_1(n) \in \mathcal{P} \iff L_2(n) \in \mathcal{P}) \land$$
$$(L_1(n) \in \mathcal{T} \iff L_2(n) \in \mathcal{T})$$

We define the  $\approx$  relation on the labels of microarchitecture similarly.

Figure 9 shows the definition of low equivalence for all configuration components. We assume that  $L, M, \mu$  and  $\Gamma$  are total functions so that domain equality is implicit. The requirements of low equivalence explicitly require that "label lookups" for both architectural and microarchitectural state return equivalent but not equal values for high labels. Call stack low equivalence requires that all entries with low  $pc_l$  are in the same position in the stack and are themselves low-equivalent. By construction, all low entries must be at the head of the stack<sup>10</sup> so it is sufficient to check that the low prefixes of each call stack are equivalent.

Definition 6 (Call Stack Prefix Low Equivalence).

$$CS_{1} \cong_{\mathcal{L}} CS_{2} \iff$$

$$(1) \quad CS_{1} = \emptyset \land \forall (pc^{i}, t^{i}) \in CS_{2}, pc^{i} \in \mathcal{H}$$
or
$$(2) \quad CS_{2} = \emptyset \land \forall (pc^{i}, t^{i}) \in CS_{1}, pc^{i} \in \mathcal{H}$$
or
$$(3) \quad CS_{1}[head] = (pc_{1}, t_{1}) =_{\mathcal{L}} (pc_{2}, t_{2}) = CS_{2}[head]$$

$$\land CS_{1}[tail] \cong_{\mathcal{L}} CS_{2}[tail]$$

b) Security Guarantees: All of the theorems in this section have full proofs which can be found in the accompanying technical report [18]. First, we show that executing programs that do not contain downgrade or call gate instructions preserve noninterference.

We use the term *valid* configurations to refer to configurations that were initialized with reasonable values. Specifically, the configurations satisfy the ALL\_PC and ALL\_T requirements and the initial call stacks are empty.

**Theorem 1** (Noninterference Modulo Downgrading and Call Gates).

For any two valid configurations,  $C_1$  and  $C_2$  and any low set of labels,  $\mathcal{L}$ , where no instruction is a dwnlbl, upcall, or dwncall:

$$(C_{i} \longrightarrow^{*} C_{i}^{'}) \land (C_{1} =_{\mathcal{L}} C_{2}) \implies C_{1}^{'} =_{\mathcal{L}} C_{2}^{'}$$

where  $\longrightarrow^*$  is the reflexive, transitive closure of  $\longrightarrow$ .

The proof is a straightforward structural induction on the operational semantics of the processor. By assuming Property 2, essentially all of the work in this proof requires proving noninterference of the  $\longrightarrow_{\mathcal{A}}$  semantics.

We next extend Theorem 1 to prove noninterference even when using upcall instructions.

# Theorem 2 (Noninterference Modulo Downgrading).

For any two valid configurations,  $C_1$  and  $C_2$ , and any low set of labels,  $\mathcal{L}$ , where no instruction is a dwnlbl or dwncall.

$$(C_i \longrightarrow^* C'_i) \land (C_1 =_{\mathcal{L}} C_2) \implies C'_1 =_{\mathcal{L}} C'_2$$

In the scenario covered by Theorem 1, once the  $pc_l$  was high, it could never be lowered again. That makes the noninterference proof trivial but also limits functionality. To prove Theorem 2, we show that all operational steps taken while an upcall is on the call stack can be modeled as a single operational step to low-equivalent configurations. We can show this since the end configuration of the upcall is predetermined by low-equivalent state and high pcs are noninterfering (i.e. programs executing with a high pc cannot modify any low visible state).

Note that while this theorem is termination-sensitive, it is not timing-sensitive. In the case where  $t_l \not\sqsubseteq pc_l$ , attackers may make observations about high state based on the timing of writes to low state. We present a corollary that provides timing sensitivity.

<sup>&</sup>lt;sup>10</sup>This is enforced by preventing dwncalls while inside of an upcall.

$$\begin{aligned} pc_1 &=_{\mathcal{L}} pc_2 \iff ((pc_{l1} \wedge pc_{l2}) \notin \mathcal{L}) \lor (pc_1 = pc_2) \\ t_1 &=_{\mathcal{L}} t_2 \iff ((t_{l1} \wedge t_{l2}) \notin \mathcal{L}) \lor (t_1 = t_2) \\ L_1 &=_{\mathcal{L}} L_2 \iff (L_1 \approx L_2) \land (\forall j \in \operatorname{dom}(\mathcal{L}). \quad L(j) \in \mathcal{L} \implies L_1(j) = L_2(j)) \\ M_1 &=_{\mathcal{L}} M_2 \iff (L_1 \approx L_2) \land (\forall j \in \operatorname{dom}(\mathcal{M}). \quad L(j) \in \mathcal{L} \implies M_1(j) = M_2(j)) \\ \mu_1 &=_{\mathcal{L}} \mu_2 \iff (\Gamma(C_1) \approx \Gamma(C_2)) \land (\forall n \in \operatorname{dom}(\mu). \quad \Gamma(C)(n) \in \mathcal{L} \implies \mu_1(n) = \mu_2(n)) \\ CS_1 &=_{\mathcal{L}} CS_2 \iff CS_1 \cong_{\mathcal{L}} CS_2 \\ C_1 &=_{\mathcal{L}} C_2 \iff (pc_1 =_{\mathcal{L}} pc_2) \land (t_1 =_{\mathcal{L}} t_2) \land (M_1 =_{\mathcal{L}} M_2) \land (\mu_1 =_{\mathcal{L}} \mu_2) \land (CS_1 =_{\mathcal{L}} CS_2) \end{aligned}$$

Fig. 9. Low Equivalence of Configuration Components, relative to "low" labels, L.

**Corollary 1** (Timing-Sensitive Noninterference Modulo Downgrading).

If  $(pc_l \in \mathcal{L} \implies t_l \in \mathcal{L})$  for all intermediate configurations and upcall regions have fixed durations, then Theorem 2 provides timing sensitivity.

This corollary ensures that any time that low writes are possible, the attacker will observe them occurring at the same time. Furthermore, the duration of high call gates will be determined by low information.

As defined in Section II, nonmalleability is essentially defined as maintaining both robust declassification and transparent endorsement. Even with no syntactic restrictions (unlike the prior theorems) our ISA enforces nonmalleability.

**Theorem 3** (Nonmalleable Information Flow). For attacker induced high label sets S and U and their respective complements,  $\mathcal{P}$  and  $\mathcal{T}$  and valid configurations,  $\forall \{s, u\} \in \{1, 2\}, C_{su}$ 

$$\begin{array}{ccc} ((C_{su} \longrightarrow C'_{su}) & \wedge (C_{Iu} =_{\mathcal{P}} C_{2u}) & \wedge (C_{sI} =_{\mathcal{T}} C_{s2})) \\ & \Longrightarrow \\ ((C'_{II} =_{\mathcal{P}} C'_{2I} \implies C'_{I2} =_{\mathcal{P}} C'_{22}) \\ & \wedge \\ (C'_{II} =_{\mathcal{T}} C'_{I2} \implies C'_{2I} =_{\mathcal{T}} C'_{22})) \end{array}$$

Assuming Theorem 2, we only need to reason about instructions which violate information flow: dwncall and uplbl. The key restrictions which provide nonmalleability are those that prevent the  $pc_l$  or  $t_l$  from becoming compromised and the restriction that compromised data is never downgraded.

# VIII. PROGRAM EXAMPLES

We now describe examples of how to use our ISA features in practical scenarios.

AES is a well known encryption algorithm which does not require the program to branch on any secrets [37]. Instead, AES uses a public lookup table indexed by computation involving both the secret key and public input. This behavior of executing secret-dependent memory accesses makes it susceptible to a number of timing-channel attacks [38]–[42], some of which are similar to the vulnerability in Figure 3.

Figure 10 is a toy version if this AES-style lookup table access in our ISA. Without mitigation techniques, the execution

Fig. 10. Mitigated AES.

of the second load (b) could be faster if it accesses the same cache line from (a). Similarly, another program may also infer the value of the secret through cache contention.

One existing software-based mitigation technique for preventing this cache timing channel is to preload the entire lookup table ahead of time [43]. Preloading allows a cache implementation to fill its entries with useful data based only on public addresses. However, this approach is not guaranteed to be secure on normal hardware; if a cache were too small to contain the entire table (or evicted entries for any other reason), it is possible that some lookups would trigger misses, thereby leaking information with an unexpectedly *slow* duration for certain keys. Other efforts to eliminate these problems with AES still rely on the assumption that certain instructions are constant-time [44].

Our ISA enables software to control microarchitectural timing channels in a principled manner. On hardware implementing our ISA, the secret-dependent loads in Figure 10 cannot affect public microarchitectural state and therefore cannot leak secret information through memory contention. Additionally, the strategy of preloading the cache can still improve performance on some implementations. One potential CPU implementation might maintain private and public cache partitions. During the preload phase, public and trusted code fills up the public cache partition with some or all of the AES table. During the encryption phase, secret code can read

```
# PCLBL = TLBL = (PUBLIC, UNTRUSTED)
# L(guess) = (PUBLIC, UNTRUSTED)
# L(pass) = (SECRET, TRUSTED)
dwncall check_pass
_____
# PCLBL = TLBL = (PUBLIC, TRUSTED)
check pass:
 endoreg guess, TRUSTED
 upcall est, ST, ST, end_check
# PCLBL = TLBL = (SECRET, TRUSTED)
 beq guess, pass, success
 li res 0
 upret
success:
 li res 1
 upret
end_check:
 declreg res, PUBLIC
 dwnret
```

Fig. 11. Password checking in the proposed ISA.

those entries but cannot modify them, instead making updates only to the private cache partition. This implementation would allow for a more secure and efficient AES execution. Nevertheless, the duration of the entire execution could leak some information about the secret key; this example also shows how software can use an upcall instruction to obscure that duration by providing an explicit end time (via the est argument in the example's upcall).

## A. Password Checker

In this example, we show how to implement a nonmalleable password checker which can be called by untrusted users with the dwncall instruction. The code for this checker is shown in Figure 11. This program starts in a public and untrusted context, which would be typical for an unauthenticated user. The untrusted user generates their guess and puts it into the register called guess. Then they use the dwncall instruction to call the check\_pass function and gain high integrity. This is analogous to executing a system call in a typical operating system, where the user program is linked with trusted libraries and jumps into that code.

Once the check\_pass function has started, it must endorse the user's guess, since a trusted pc cannot branch on low integrity data. In order to compare the secret password value with the guess, the program executes an upcall instruction to enter a timing-mitigated region. Inside that region, the program computes either a 1 or 0 based on whether or not the guess was right or wrong, and then returns. Finally, at the end of the check\_pass function, the result is declassified to public and the call gate exits back to the untrusted context.

If an untrusted user were to execute the check\_pass function like a normal function call, their attempts to endorse their own guess and upcall into a secret and trusted state would both fail. This example illustrates the nonmalleability guarantees and how trusted system code can be resident in the system but only accessible via call gates.

# IX. RELATED WORK

a) Software Information Flow Control: Software-based IFC has been applied in many settings with the goal of eliminating timing channels [11], [17], [45]-[50]. Kashyap et al. [48] discuss various software strategies for enforcing timing-sensitive noninterference. In particular, they focus on using lattice scheduling to ensure that the ordering of visible events does not leak secret information. Parsec [46] is a language for concurrent programming which, given a race-freedom analysis, ensures observational determinism, a noninterference condition for concurrent programs. Bedford et al. [17] have also shown how a hybrid IFC system can provide progress-sensitive noninterference, a weaker condition than timing sensitivity; it does not leak information based on which sets of outputs a program successfully produces. Secure multi-execution, where a program is executed multiple times at varying security levels, has also been used to prove timingsensitive noninterference [50]. LIO [47] is a Haskell-based language extension for mitigating both external and internal channels through the use of monadic computation and IFC. Of the aforementioned systems, only LIO handles external timing channels. Like our ISA, LIO provides a dynamic semantics for enforcing noninterference but lacks features such as downgrading and integrity tracking.<sup>11</sup> Additionally, it is a high-level language which requires a software runtime for its security, making it unsuitable as an ISA description.

b) Hardware-level information flow control: IFC techniques have also been used to build timing-safe hardware. While not focused on timing, Suh et al. [52] showed that processors could implement efficient information flow tracking. Caisson and Sapper [6], [33] provided a nested state machine abstraction for circuit design and proved that hardware built using those tools enforced timing-sensitive noninterference. More expressive HDLs that provide similar security guarantees have also been developed using dependent types [30], [53]. The Hyperflow processor [7] is a fully-featured implementation of a RISC-V CPU developed using these techniques.

c) Secure ISAs: While many of the above HW IFC systems presented CPUs and ISAs, they were focused on security guarantees about the circuits. None of them have proved security results for programs executing on top of their example abstractions. Ge et al. [12] have defined a set of properties they argue post-Spectre ISAs (called aISAs) must enforce to provide efficient, timing-sensitive security. These properties primarily focus on prescribing how an operating system can interact with the hardware to provide timing security. They refer to concrete mechanisms such as hardware partitioning and time multiplexing rather than the security properties that these mechanisms should aim to enforce. Our ISA provides more fundamental guarantees than those suggested in their

<sup>&</sup>lt;sup>11</sup>Follow-up work (e.g., [19], [51]) addresses some of these features.

work, but real implementations of our ISA would likely exhibit many of the properties they list.

Yu et al. [26] have built an ISA extension for "oblivious computing" and have proved probabilistic noninterference results. They have also built and measured the performance of a speculative, out-of-order processor using this ISA and demonstrated its performance improvements over more conservative techniques. Their ISA treats security as an optional component which software may opt-in to by labeling instruction operands as public or secret. This is promising evidence of the practicality of efficient microarchitectures for secure ISAs.

The work of Zhang et al. [11] on language-based timing mitigation defines a software-hardware contract based on "write labels" and "read labels" that almost directly parallel our  $pc_l$  and  $t_l$ . However, that contract requires well-typed programs that correctly specify write and read labels; the hardware itself is not assumed to enforce any restrictions on how these labels change over time. Furthermore, our ISA considers both confidentiality and integrity while enforcing nonmalleable downgrading. We do not require a fully trusted entity to perform timing mitigation: any upcall caller can implement their own mitigation algorithm in their own context.

d) OS-level information flow control: Asbestos [54] and HiStar [23] are two well known IFC operating systems. They do not assure timing safety. However, HiStar's notion of *gates* informed our call gate mechanism, but the restrictions on gates and the security guarantees differ from ours. NickelOS [36] has been recently developed using *intransitive noninterference*, which allows more flexible security policies than traditional IFC. However, NickelOS is not timing-sensitive and focuses on information flow exposed through OS APIs.

# X. CONCLUSION AND FUTURE WORK

In this paper, we have proposed an ISA that defines a contract between software and hardware that defines how information may or may not affect the timing of instructions. Importantly, it provides timing safety without requiring that instructions explicitly execute in worst-case time. As a byproduct, our proofs delineate conditions that hardware should satisfy, thus providing guidance to hardware designers.

We foresee many avenues for further research in the domain of timing secure ISAs. Modeling more ISA features such as exceptions, memory models, and other concurrency mechanisms can provide evidence toward the practicality of this approach to ISA design. Furthermore, it will help expose more potential side channels that exist throughout the complex environment of multicore processors.

Given this foundation, we can develop new instructions or instruction semantics that expose different timing characteristics, such as fixed-latency scratchpad memory [55] or other "obvlivious" computation [26]. Experimenting with these new ideas in the context of a nonmalleable ISA can also ensure that the security guarantees hold end to end.

The largest open question is how to formally verify that hardware implementations satisfy the properties defined in Section VI, allowing us to connect security guarantees of high-level languages and verified operating systems to the actual behavior of the underlying hardware. We think there are many opportunities to improve existing secure HDLs for finer grained downgrading (of both data and time), and to adapt hardware functional verification techniques to prove IFC properties of processors.

#### ACKNOWLEDGMENTS

We would like to thank Adrian Sampson, Ryan Doenges, Ethan Cecchetti, Josh Gancher, Rolph Recto, Tom Magrino, Sungbo Park and our reviewers for their constructive feedback. This work was partly sponsored by NSF grant CNS-1513797 and DARPA contract HR0011-18-C-0014. Opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies.

#### REFERENCES

- P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, "Spectre attacks: Exploiting speculative execution," in 40th IEEE Symposium on Security and Privacy (S&P'19), 2019.
- [2] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, "Meltdown: Reading kernel memory from user space," in 27th USENIX Security Symposium (USENIX Security 18), 2018.
- [3] J. Van Bulck, M. Minkin, O. Weisse, D. Genkin, B. Kasikci, F. Piessens, M. Silberstein, T. F. Wenisch, Y. Yarom, and R. Strackx, "Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution," in 27<sup>th</sup> USENIX Security Symp., 2018, pp. 991–1008.
- [4] F. McKeen, I. Alexandrovich, A. Berenzon, C. Rozas, H. Shafi, V. Shanbhogue, and U. Savagaonkar, "Innovative instructions and software model for isolated execution," in *Workshop on Hardware and Architectural Support for Security and Privacy*, 2013.
- [5] D. Zhang, Y. Wang, G. E. Suh, and A. C. Myers, "A hardware design language for timing-sensitive information-flow security," ACM SIGPLAN Notices, vol. 50, no. 4, pp. 503–516, 2015.
- [6] X. Li, M. Tiwari, J. K. Oberg, V. Kashyap, F. T. Chong, T. Sherwood, and B. Hardekopf, "Caisson: a hardware description language for secure information flow," in ACM SIGPLAN Notices, vol. 46, no. 6. ACM, 2011, pp. 109–120.
- [7] A. Ferraiuolo, M. Zhao, A. C. Myers, and G. E. Suh, "Hyperflow: A processor architecture for nonmalleable, timing-safe information flow security," in *Proceedings of the 25th ACM Conference on Computer* and Communications Security (CCS), 2018.
- [8] J. B. Almeida, M. Barbosa, G. Barthe, and F. Dupressoir, "Verifiable side-channel security of cryptographic implementations: constanttime MEE-CBC," Cryptology ePrint Archive, Report 2015/1241, 2015, https://eprint.iacr.org/2015/1241.
- [9] G. Barthe, G. Betarte, J. Campo, C. Luna, and D. Pichardie, "Systemlevel non-interference for constant-time cryptography," in *Proceedings of* the 2014 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS '14. New York, NY, USA: ACM, 2014, pp. 1267– 1279. [Online]. Available: http://doi.acm.org/10.1145/2660267.2660283
- [10] C. Watt, J. Renner, N. Popescu, S. Cauligi, and D. Stefan, "CT-wasm: Type-driven secure cryptography for the web ecosystem," *Proc. ACM Program. Lang.*, vol. 3, no. POPL, pp. 77:1–77:29, Jan. 2019. [Online]. Available: http://doi.acm.org/10.1145/3290390
- [11] D. Zhang, A. Askarov, and A. C. Myers, "Language-based control and mitigation of timing channels," in *Proceedings* of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '12. New York, NY, USA: ACM, 2012, pp. 99–110. [Online]. Available: http://doi.acm.org/10.1145/2254064.2254078

- [12] Q. Ge, Y. Yarom, and G. Heiser, "No security without time protection: We need a new hardware-software contract," in *Proceedings of the 9th Asia-Pacific Workshop on Systems*, ser. APSys '18. New York, NY, USA: ACM, 2018, pp. 1:1–1:9. [Online]. Available: http://doi.acm.org/10.1145/3265723.3265724
- [13] A. Sabelfeld and A. C. Myers, "Language-based information-flow security," *IEEE Journal on selected areas in communications*, vol. 21, no. 1, pp. 5–19, 2003.
- [14] E. Cecchetti, A. C. Myers, and O. Arden, "Nonmalleable information flow control," in *Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2017, pp. 1875–1891.
- [15] A. Waterman, Y. Lee, D. A. Patterson, and K. Asanovi, "The RISC-V instruction set manual. volume 1: User-level ISA, version 2.0," CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGI-NEERING AND COMPUTER SCIENCES, Tech. Rep., 2014.
- [16] A. C. Myers, A. Sabelfeld, and S. Zdancewic, "Enforcing robust declassification and qualified robustness," *Journal of Computer Security*, vol. 14, no. 2, pp. 157–196, 2006.
- [17] A. Bedford, S. Chong, J. Desharnais, E. Kozyri, and N. Tawbi, "A progress-sensitive flow-sensitive inlined information-flow control monitor (extended version)," *Computers & Security*, vol. 71, pp. 114–131, 2017.
- [18] D. Zagieboylo, G. E. Suh, and A. C. Myers, "Using information flow to design an isa that controls timing channels," Cornell University, Tech. Rep., 2019.
- [19] P. Buiras, D. Stefan, and A. Russo, "On dynamic flow-sensitive floatinglabel systems," in 2014 IEEE 27th Computer Security Foundations Symposium. IEEE, 2014, pp. 65–79.
- [20] T. H. Austin and C. Flanagan, "Efficient purely-dynamic information flow analysis," in *Proceedings of the ACM SIGPLAN Fourth Workshop* on *Programming Languages and Analysis for Security*, ser. PLAS '09. New York, NY, USA: ACM, 2009, pp. 113–124. [Online]. Available: http://doi.acm.org/10.1145/1554339.1554353
- [21] A. Askarov, D. Zhang, and A. C. Myers, "Predictive black-box mitigation of timing channels," in *Proceedings of the 17th ACM conference* on Computer and communications security. ACM, 2010, pp. 297–307.
- [22] J. H. Saltzer, "Protection and the control of information sharing in Multics," *Communications of the ACM*, vol. 17, no. 7, pp. 388–402, 1974.
- [23] N. Zeldovich, S. Boyd-Wickizer, E. Kohler, and D. Mazières, "Making information flow explicit in HiStar," in *Proceedings* of the 7th Symposium on Operating Systems Design and Implementation, ser. OSDI '06. Berkeley, CA, USA: USENIX Association, 2006, pp. 263–278. [Online]. Available: http://dl.acm.org/citation.cfm?id=1298455.1298481
- [24] J. Van Bulck, F. Piessens, and R. Strackx, "Nemesis: Studying microarchitectural timing leaks in rudimentary CPU interrupt logic."
- [25] A. Sabelfeld and D. Sands, "Dimensions and principles of declassification," in 18th IEEE Computer Security Foundations Workshop (CSFW'05). IEEE, 2005, pp. 255–269.
- [26] J. Yu, L. Hsiung, M. El Hajj, and C. W. Fletcher, "Data oblivious ISA extensions for side channel-resistant and high performance computing."
- [27] L. Zheng and A. C. Myers, "Dynamic security labels and noninterference," in *Formal Aspects in Security and Trust.* Springer, 2005, pp. 27–40.
- [28] S. Herbert and D. Marculescu, "Analysis of dynamic voltage/frequency scaling in chip-multiprocessors," in *Int'l Symp. on Low Power Electronics and Design*, 2007.
- [29] M. Backes and B. Pfitzmann, "Computational probabilistic noninterference," *International Journal of Information Security*, vol. 3, no. 1, pp. 42–60, 2004.
- [30] A. Ferraiuolo, "Security results for SIRRTL, a hardware description language for information flow security," 2017.
- [31] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović, "Chisel: constructing hardware in a scala embedded language," in *Design Automation Conference (DAC)*, 2012 49th ACM/EDAC/IEEE. IEEE, 2012, pp. 1212–1221.
  [32] A. Ferraiuolo, R. Xu, D. Zhang, A. C. Myers, and G. E.
- [32] A. Ferraiuolo, R. Xu, D. Zhang, A. C. Myers, and G. E. Suh, "Verification of a practical hardware security architecture through static information flow analysis," in *Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems*, ser. ASPLOS '17. New York, NY, USA: ACM, 2017, pp. 555–568. [Online]. Available: http://doi.acm.org/10.1145/3037697.3037739

- [33] X. Li, V. Kashyap, J. K. Oberg, M. Tiwari, V. R. Rajarathinam, R. Kastner, T. Sherwood, B. Hardekopf, and F. T. Chong, "Sapper: A language for hardware-level security policy enforcement," in ACM SIGPLAN Notices, vol. 49, no. 4. ACM, 2014, pp. 97–112.
- [34] D. Lustig, M. Pellauer, and M. Martonosi, "PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models," in 47th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-47. Washington, DC, USA: IEEE Computer Society, 2014, pp. 635–646. [Online]. Available: http://dx.doi.org/10.1109/MICRO.2014.38
- [35] Y. A. Manerkar, D. Lustig, M. Martonosi, and M. Pellauer, "RTLcheck: Verifying the memory consistency of RTL designs," in *Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture*, ser. MICRO-50 '17. New York, NY, USA: ACM, 2017, pp. 463–476. [Online]. Available: http://doi.acm.org/10.1145/3123939.3124536
- [36] H. Sigurbjarnarson, L. Nelson, B. Castro-Karney, J. Bornholt, E. Torlak, and X. Wang, "Nickel: a framework for design and verification of information flow control systems," in 13<sup>th</sup> USENIX Symp. on Operating Systems Design and Implementation (OSDI), 2018, pp. 287–305.
- [37] J. Daemen and V. Rijmen, "AES proposal: Rijndael," 1999.
- [38] D. J. Bernstein, "Cache-timing attacks on AES."
- [39] G. Irazoqui, M. S. Inci, T. Eisenbarth, and B. Sunar, "Wait a minute! A fast, cross-VM attack on AES," in *International Workshop on Recent Advances in Intrusion Detection*. Springer, 2014, pp. 299–319.
- [40] R. Spreitzer and T. Plos, "Cache-access pattern attack on disaligned AES T-tables," in *International Workshop on Constructive Side-Channel Analysis and Secure Design*. Springer, 2013, pp. 200–214.
- [41] B. Gülmezoğlu, M. S. Inci, G. Irazoqui, T. Eisenbarth, and B. Sunar, "A faster and more realistic flush+ reload attack on AES," in *International Workshop on Constructive Side-Channel Analysis and Secure Design*. Springer, 2015, pp. 111–126.
- [42] D. A. Osvik, A. Shamir, and E. Tromer, "Cache attacks and countermeasures: The case of AES," in *Topics in Cryptology – CT-RSA 2006*, D. Pointcheval, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 1–20.
- [43] E. Brickell, G. Graunke, M. Neve, and J.-P. Seifert, "Software mitigations to hedge AES against cache-based software side channel vulnerabilities." *IACR Cryptology ePrint Archive*, vol. 2006, p. 52, 2006.
- [44] E. Käsper and P. Schwabe, "Faster and timing-attack resistant AES-GCM," in *Cryptographic Hardware and Embedded Systems-CHES 2009*. Springer, 2009, pp. 1–17.
- [45] D. Volpano and G. Smith, "Eliminating covert flows with minimum typings," in *Computer Security Foundations Workshop*, *IEEE(CSFW)*, vol. 00, 06 1997, p. 156. [Online]. Available: doi.ieeecomputersociety.org/10.1109/CSFW.1997.596807
- [46] S. Zdancewic and A. C. Myers, "Observational determinism for concurrent program security," in *Computer Security Foundations Workshop*, 2003. Proceedings. 16th IEEE. IEEE, 2003, pp. 29–43.
- [47] D. Stefan, A. Russo, P. Buiras, A. Levy, J. C. Mitchell, and D. Maziéres, "Addressing covert termination and timing channels in concurrent information flow systems," in *Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming*, ser. ICFP '12. New York, NY, USA: ACM, 2012, pp. 201–214. [Online]. Available: http://doi.acm.org/10.1145/2364527.2364557
- [48] V. Kashyap, B. Wiedermann, and B. Hardekopf, "Timing-and termination-sensitive secure information flow: Exploring a new approach," in 2011 IEEE Symposium on Security and Privacy. IEEE, 2011, pp. 413–428.
- [49] J. Agat, "Transforming out timing leaks," in 27<sup>th</sup> ACM Symp. on Principles of Programming Languages (POPL), Jan. 2000, pp. 40–53. [Online]. Available: http://dl.acm.org/citation.cfm?id=325694.325702
- [50] D. Devriese and F. Piessens, "Noninterference through secure multiexecution," in 2010 IEEE Symposium on Security and Privacy. IEEE, 2010, pp. 109–124.
- [51] P. Buiras, D. Vytiniotis, and A. Russo, "HLIO: Mixing static and dynamic typing for information-flow control in Haskell," in *Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming*, ser. ICFP 2015. New York, NY, USA: ACM, 2015, pp. 289–301. [Online]. Available: http://doi.acm.org/10.1145/2784731.2784758
- [52] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, "Secure program execution via dynamic information flow tracking," in ACM Sigplan Notices, vol. 39, no. 11. ACM, 2004, pp. 85–96.

- [53] A. Ferraiuolo, W. Hua, A. C. Myers, and G. E. Suh, "Secure information flow verification with mutable dependent types," in *Proceedings of the 54th Annual Design Automation Conference 2017*, ser. DAC '17. New York, NY, USA: ACM, 2017, pp. 6:1–6:6. [Online]. Available: http://doi.acm.org/10.1145/3061639.3062316
  [54] P. Efstathopoulos, M. Krohn, S. VanDeBogart, C. Frey, D. Ziegler, E. Kohler, D. Mazieres, F. Kaashoek, and R. Morris, "Labels and event processes in the Ashesto operating system" in ACM SIGOPS Operating
- processes in the Asbestos operating system," in ACM SIGOPS Operating Systems Review, vol. 39, no. 5. ACM, 2005, pp. 17–30. [55] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel,
- K. Banakat, S. Steinke, B.-S. Lee, M. Banakishinan, and F. Matweler, "Scratchpad memory: A design alternative for cache on-chip memory in embedded systems," in *Hardware/Software Codesign*, 2002. CODES 2002. Proceedings of the Tenth International Symposium on. IEEE, 2002, pp. 73–78.