Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Nowadays, many applications (apps) manipulate users’ private data. Such apps could have been written by anyone and users who wish to benefit from their functionality are forced to grant them access to their data—something that most users will do without a second thought [21]. Once apps collect users’ information, there are no guarantees about how they handle it, thus leaving room for data theft and data breach by malicious apps. The key to guaranteeing security without sacrificing functionality is not granting or denying access to sensitive data, but rather ensuring that information only flows into appropriate places.

Information-flow control (IFC) [32] is a promising programming language-based approach to enforcing security. IFC scrutinizes how data of different sensitivity levels (e.g., public or private) flows within a program, and raises alarms when there is an unsafe flow of information. Most IFC tools require the design of new languages, compilers, interpreters, or modifications to the runtime, e.g., [4, 24, 26, 29]. In this scenario, the functional programming language Haskell plays a unique privileged role: it is able to enforce security via libraries [18] by using an embedded domain-specific language.

Fig. 1.
figure 1

Public computation

Many of the state-of-the-art Haskell security libraries, namely LIO [37], HLIO [6], and MAC [31], bring ideas from Mandatory Access Control [3] into a language-based setting. Every computation in such libraries has a current label which is used to (i) approximate the sensitivity level of all the data in scope and (ii) restrict subsequent side-effects which might compromise security. From now on, we simply use the term libraries when referring to LIO, HLIO, and MAC.

Fig. 2.
figure 2

Labeled values

IFC uses labels to model the sensitivity of data, which are then organized in a security lattice [7] specifying the allowed flows of information, i.e., \(\ell _{1}\;\sqsubseteq \;\ell _{2}\) means that data with label \(\ell _{1}\) can flow into entities labeled with \(\ell _{2}\). Although these libraries are parameterized on the security lattice, for simplicity we focus on the classic two-point lattice with labels and to respectively denote secret (high) and public (low) data, and where is the only disallowed flow. Figure 1 shows a graphical representation of a public computation in these libraries, i.e. a computation with current label . The computation can read or write data in scope, which is considered public (e.g., average temperature of 17\(^\circ \)C in the Swedish summer), and it can write to public (-) or secret (-) sinks. By contrast, a secret computation, i.e. a computation with current label , can also read and write data in its scope, which is considered sensitive, but in order to prevent information leaks it can only write to sensitive/secret sinks. Structuring computations in this manner ensures that sensitive data does not flow into public entities, a policy known as noninterference [10]. While secure, programming in this model can be overly restrictive for users who want to manipulate differently-labeled values.

To address this shortcoming, libraries introduce the notion of a labeled value as an abstract data type which protects values with explicit labels, in addition to the current label. Figure 2 shows a public computation with access to both public and sensitive pieces of information, such as a password (pwd). Public computations can freely manipulate sensitive labeled values provided that they are treated as black boxes, i.e. they can be stored, retrieved, and passed around as long as its content is not inspected. Libraries LIO and HLIO even allow public computations to inspect the contents of sensitive labeled values, raising the current label to to keep track of the fact that a secret is in scope—this variant is known as a floating-label system.

Reading sensitive data usually amounts to “tainting” the entire context or ensuring the context is as sensitive as the data being observed. As a result, the system is susceptible to an issue known as label creep: reading too many secrets may cause the current label to be so high in the lattice that the computation can no longer perform any useful side effects. To address this problem, libraries provide a primitive which enables public computations to spawn sub-computations that access sensitive labeled values without tainting the parent. In a sequential setting, such sub-computations are implemented by special function calls. In the presence of concurrency, however, they must be executed in a different thread to avoid compromising security through internal timing and termination covert channels [36].

Practical programs need to manipulate sensitive labeled values by transforming them. It is quite common for these operations to be naturally free of I/O or other side effects, e.g., arithmetical or algebraic operations, especially in applications like image processing, cryptography, or data aggregation for statistical purposes. Writing such functions, known as pure functions, is the bread and butter of functional programming style, and is known to improve programmer productivity, encourage code reuse, and reduce the likelihood of bugs [14]. Nevertheless, the programming model involving sub-computations that manipulate secrets forces an imperative style, whereby computations must be structured into separate compartments that must communicate explicitly. While side-effecting instructions have an underlying structure (called monad [22]), research literature has neglected studying structures for labeled values and their consequences for the programming model. To empower programmers with the simpler, functional style, we propose additional operations that allow pure functions to securely manipulate labeled values, specifically by means of a structure similar to applicative functors [20]. In particular, this structure is useful in concurrent settings where it is no longer necessary to spawn threads to manipulate sensitive data, thus making the code less imperative (i.e., side-effect free). Interestingly, the evaluation strategy of the host language (i.e. call-by-value or call-by-name) affects the validity of our security guarantees. Specifically, call-by-name turns out to naturally enforce progress-sensitive non-interference in a concurrent setting.

Additionally, practical programs often aggregate information from heterogeneous sources. For that, programs needs to upgrade labeled values to an upper bound of the labels being involved before data can be combined. In previous incarnations of the libraries, such relabelings require to spawn threads just for that purpose. As before, the reason for that is libraries decoupling every computation which manipulate sensitive data—even those for simply relabeling—so that the internal timing and termination covert channels imposed no threats. In this light, we introduce a primitive to securely relabel labeled values, which can be applied irrespective of the computation’s current label and does not require spawning threads.

We provide a mechanized security proof for the security library MAC and claim our results also apply to LIO and HLIO. MAC has fewer lines of code and leverages types to enforce confidentiality, thus making it ideal to model its semantics in a dependently-typed language like Agda. The contributions of this paper are: (i) we introduce a functor structure equipped with an applicative operator that enables users to conveniently manipulate and combine labeled values using pure functions, encouraging a more functional (side-effect free) programming style; (ii) we introduce a relabeling primitive that securely modifies the label of labeled values, bypassing the need to spawn threads when aggregating heterogeneous data; (iii) we identify and discuss the impact of the evaluation strategy of the host language on the security of the applicative operators in MAC with respect to the internal timing and termination covert channels; (iv) we implement a prototype of our ideas in the MAC libraryFootnote 1; and (v) we formalize MAC with secure applicative operators as a \(\lambda \)-calculus, providing a mechanized proof in Agda of progress-insensitive (PINI) and progress-sensitive noninterference (PSNI) [1] for the sequential and (respectively) concurrent setting.

This paper is organized as follows. Section 2 describes the core aspects of MAC. Sections 3 and 4 present functors, applicative, and relabeling operations. Section 5 gives formal guarantees. Section 6 gives related work and Sect. 7 concludes.

Fig. 3.
figure 3

Simplified API for MAC

2 Background

In MAC, each label is represented as an abstract data type. Figure 3 shows the core part of MAC’s API. Abstract data type \( Labeled \;\ell \; a \) classifies data of type \( a \) with a security label \(\ell \). For instance, is a sensitive integer, while is a public string. (Symbol \(\mathbin {::}\) is used to describe the type of terms in Haskell.) Abstract data type \( MAC \;\ell \; a \) denotes a (possibly) side-effectful secure computation which handles information at sensitivity level \(\ell \) and yields a value of type \( a \) as a result. A \( MAC \;\ell \; a \) computation enjoys a monadic structure, i.e. it is built using the fundamental operations \( return \mathbin {::} a \rightarrow MAC \;\ell \; a \) and \((\mathbin {>\!\!\!>=})\mathbin {::} MAC \;\ell \; a \rightarrow ( a \rightarrow MAC \;\ell \; b )\rightarrow MAC \;\ell \; b \) (read as “bind”). The operation \( return \; x \) produces a computation that returns the value denoted by \( x \) and produces no side-effects. The function \((\mathbin {>\!\!\!>=})\) is used to sequence computations and their corresponding side-effects. Specifically, \( m \mathbin {>\!\!\!>=} f \) takes a computation \( m \) and function \( f \) which will be applied to the result produced by running \( m \) and yields the resulting computation. We sometimes use Haskell’s do-notation to write such monadic computations. For example, the program \( m \mathbin {>\!\!\!>=}\lambda x \rightarrow return \;( x \mathbin {+}\mathrm {1})\), which adds \(\mathrm {1}\) to the value produced by m, can be written as shown in Fig. 4.

Fig. 4.
figure 4

do-notation

Secure flows of information. Generally speaking, side-effects in a \( MAC \;\ell \; a \) computation can be seen as actions which either read or write data. Such actions, however, need to be conceived in a manner that respects the sensitivity of the computations’ results as well as the sensitivity of sources and sinks of information modeled as labeled values. The functions \( label \) and \( unlabel \) allow \( MAC \;\ell \; a \) computations to securely interact with labeled values. To help readers, we indicate the relationship between type variables in their subindexes, i.e. we use and to attest that . If a computation writes data into a sink, the computation should have at most the sensitivity of the sink itself. This restriction, known as no write-down [3], respects the sensitivity of the data sink, e.g., the sink never receives data more sensitive than its label. In the case of function \( label \), it creates a fresh labeled value, which from the security point of view can be seen as allocating a fresh location in memory and immediately writing a value into it—thus, it applies the no write-down principle. In the type signature of \( label \), what appears on the left-hand side of the symbol \(\Rightarrow \) are type constraints. They represent properties that must be statically fulfilled about the types appearing on the right-hand side of \(\Rightarrow \). Type constraint ensures that when calling \( label \; x \) (for some \( x \) in scope), the computation creates a labeled value only if , i.e. the current label of the computation, is no more confidential than , i.e. the sensitivity of the created labeled value. In contrast, a computation is only allowed to read labeled values at most as sensitive as —observe the type constraint in the type signature of \( unlabel \). This restriction, known as no read-up [3], protects the confidentiality degree of the result produced by , i.e. the result might only involve data which is, at most, as sensitive as .

Fig. 5.
figure 5

Implicit flows are ill-typed.

The interaction between the current label of a computation and the no write-down restriction makes implicit flow ill-typed, as shown in Fig. 5. In order to branch on sensitive data, a program needs first to unlabel it, thus requiring the computation to be of type (for some type \( a \)). From that point, the computation cannot write to public data regardless of the taken branch. As MAC provides additional primitives responsible for producing useful side-effects like exception handling, network communication, references, and synchronization primitives—we refer the interested reader to [31] for further details.

Handling data with different sensitivity. Programs handling data with heterogeneous labels necessarily involve nested \( MAC \;\ell \; a \) computations in its return type. For instance, consider a piece of code \( m \) with type which handles both public and secret information. Note that the type indicates that it returns a public string and a sensitive computation . While somehow manageable for a two-point lattice, it becomes intractable for general cases. In a sequential setting, MAC presents the primitive \( join ^{\textsf {{\tiny MAC}}}\) to safely integrate more sensitive computations into less sensitive ones—see Fig. 3. Operationally, function \( join ^{\textsf {{\tiny MAC}}}\) runs the computation of type and wraps the result into a labeled expression to protect its sensitivity. As we will show in Sect. 5, Haskell programs written using the monadic API, \( label \), \( unlabel \), and \( join ^{\textsf {{\tiny MAC}}}\) satisfy PINI, where leaks due to non-termination of programs are ignored. This design decision is similar to that taken by mainstream IFC compilers (e.g., [11, 25, 34]), where the most effective manner to exploit termination takes exponential time in the size (of bits) of the secret [1].

Fig. 6.
figure 6

Termination leak

Concurrency. The mere possibility to run (conceptually) simultaneous \( MAC \;\ell \) computations provides attackers with new tools to bypass security checks. In particular, the presence of threads introduce the internal timing covert channel, a channel that gets exploited when, depending on secrets, the timing behavior of threads affect the order of events performed on public-shared resources [35]. Furthermore, concurrency magnifies the bandwidth of the termination covert channel to be linear in the size (of bits) of secrets [36]. Since the same countermeasure closes both covert channels, we focus on the latter. What constitutes a termination leak is the fact that a non-terminating -computation can suppress the execution of subsequently -events. To illustrate this point, we present the attack in Fig. 6. We assume that there exists a function \( publish \) which sends an integer to a public blog. Observe how function \( leak \) may suppress subsequent public events with infinite loops. If a thread runs \( leak \;\mathrm {0}\; secret \), the code publishes \(\mathrm {0}\) only if the first bit of \( secret \) is \(\mathrm {0}\); otherwise it loops (see function \( loop \)) and it does not produce any public effect. Similarly, a thread running \( leak \;\mathrm {1}\; secret \) will leak the second bit of \( secret \), while a thread running \( leak \;\mathrm {2}\; secret \) will leak the third bit of it and so on. To securely support concurrency, MAC forces programmers to decouple computations which depend on sensitive data from those performing public side-effects. As a result, non-terminating loops based on secrets cannot affect the outcome of public events. To achieve this behavior, MAC replaces \( join ^{\textsf {{\tiny MAC}}}\) by \( fork ^{\textsf {{\tiny MAC}}}\) as defined in Fig. 3. It is secure to spawn sensitive computations () from non-sensitive ones () because that decision depends on data at level .

Example 1

To show how to program using MAC, we present a simple scenario where Alice writes an API that helps users prepare and file their taxes. Alice models a tax declaration as values of type \( TaxDecl \), which is obtained based on users’ personal information—modeled as values of type \( Data \). She releases the first version of the API:

figure a

We remark that, although we focus on this API for simplicity, Alice is using the concurrent version of MAC. Function \( declareTaxes \) does two things: it fills out the tax forms (function \( fillTaxes \)) and sends them to the corresponding government agency (function \( send \)). Due to the use of \( send \), function \( declareTaxes \) returns a computation in the \( IO \)-monad—a special data type which permits arbitrary I/O effects in Haskell. Function \( send \) generates a valid PDF for tax declarations and sends it to the corresponding authorities. However, there is nothing stopping this function from leaking tax information to unauthorized entities over the network. Alice’s customers notice this problem and are concerned about how their sensitive data gets handled by the API.

Alice then decides to adapt the API to use MAC. For simplicity, we assume that MAC also includes a secure operation to send data over the network:

figure b

This primitive sends a labeled value of type \( a \) to the URL given as an argument, e.g., via HTTP-request or other network protocol. Using MAC’s concurrent API and primitive \( send ^{\textsf {{\tiny MAC}}}\), Alice rewrites her API to adhere to the following interface.

figure c

Observe that Alice’s API needs to spawn a secure computation of type in order to unlabel and access user’s data (\( user \)). Once user’s data is accessible, a pure function gets applied to it (\( fillTaxes \; info \)), the result is relabeled (\( tax \)) again and a side-effectful action takes place (\( send ^{\textsf {{\tiny MAC}}}\)). In the next section we extend MAC’s API so that it is possible to manipulate labeled values with pure functions, like \( fillTaxes \), and perform side-effectful actions, like \( send ^{\textsf {{\tiny MAC}}}\), without the need to spawn threads.

3 Functors for Labeled Values

In this section, we show how labeled values can be manipulated using functors.

Fig. 7.
figure 7

Functor structure for labeled values

Intuitively, a functor is a container-like data structure which provides a method called \( fmap \) that applies (maps) a function over its content, while preserving its structure. Lists are the most canonical example of a functor data-structure. In this case, \( fmap \) corresponds to the function \( map \), which applies a function to each element of a list, e.g. \( fmap \;(\mathbin {+}\mathrm {1})\;[\mathrm {1},\mathrm {2},\mathrm {3}]\equiv [\mathrm {2},\mathrm {3},\mathrm {4}]\). A functor structure for labeled values allows to manipulate sensitive data without the need to explicitly extract it—see Fig. 7. For instance, \( fmap \;(\mathbin {+}\mathrm {1})\; d \), where stores the number \(\mathrm {42}\), produces the number 43 as a sensitive labeled value. Observe that sensitive data gets manipulated without the need to use \( label \) and \( unlabel \), thus avoiding their overhead (no security checks are performed). Despite what intuition might suggest, it is possible to securely apply \( fmap \) in any \( MAC \;\ell \)-computation to any labeled value irrespectively of its security level. A secure implementation of \( fmap \) then allows manipulation of data without forking threads in a concurrent setting—thus, introducing flexibility when data is processed by pure (side-effect free) functions. However, obtaining a secure implementation of \( fmap \) requires a careful analysis of its security implications.

Fig. 8.
figure 8

Termination leak under call-by-value evaluation

Fig. 9.
figure 9

Security guarantees

Interestingly, the evaluation strategy of the programming language and the sequential or concurrent setting determine different security guarantees in the presence of \( fmap \). Figure 9 shows our findings. In a sequential setting with call-by-value semantics, \( fmap \) can be exploited to create a termination covert channel in a similar manner as it is done with \( join ^{\textsf {{\tiny MAC}}}\). To illustrate this point, we rephrase the attack in Fig. 6 to use \( fmap \) rather than \( join ^{\textsf {{\tiny MAC}}}\)—see Fig. 8. Under a call-by-value evaluation strategy, function \( loopOn \) passed to \( fmap \) is eagerly applied to the secret, which might introduce a loop depending on the value of the \( n \)-th bit of the secret—a termination leak. Under a call-by-name evaluation strategy, however, function \( loopOn \) does not get immediately evaluated since \( result \) is not needed for computing \( publish \; n \). Therefore, \( publish \; n \) gets executed independently of the value of the secret, i.e. no termination leaks are introduced. Instead, \( loopOn \) gets evaluated when “unlabeling” \( result \) and inspecting its value in a computation of type (for some \( a \)), which is secure to do so. Although functors can be used to exploit non-termination of programs, they impose no new risks for sequential programs (MAC already ignores termination leaks in such setting).

Fig. 10.
figure 10

Attack magnification

Unfortunately, a call-by-value concurrent semantics magnifies the bandwidth of the attack in Fig. 8 to the point where confidentiality can be systematically and efficiently broken—see Fig. 10. Assuming a secret of 100-bits, the magnification consists on leaking the whole secret by spawning a sufficient number of threads—each of them leaking a different bit. Since \( leak \) cannot exploit the termination channel under a call-by-name evaluation strategy, the magnification attack becomes vacuous under such semantics. More precisely, the attack can only trigger the execution of \( leak \) by first unlabeling \( result \), an operation impossible to perform in a public computation—recall there is no \( join ^{\textsf {{\tiny MAC}}}\) primitive for concurrent programs. As the table suggests, call-by-name gives the strongest security guarantees when extending MAC with functors. We remark that it is possible to close this termination channel under a call-by-value semantics by defining \( Labeled \) with an explicit suspension, e.g. \(\mathbf {data}\; Labeled \;\ell \; a \mathrel {=} Labeled \;(()\rightarrow a )\), and corresponding forcing operation, so that \( fmap \) behaves lazily as desired.

Example 2

Alice’s realizes that she could spare her API from forking threads by exploiting the functorial structure of labeled values.

figure d

The construct \( fmap \) applies the function \( fillTaxes \) without requiring use of \( unlabel \), while keeping the result securely encapsulated in a labeled value. Observe how the code is much less imperative, since there is no need to fork a thread to unlabel sensitive data just to apply a pure function to it.

Fig. 11.
figure 11

Lattice.

While functors help to make the code more functional, there are still other programming patterns which draw developers to fork threads due to security reasons rather than the need for multi-threading. Specifically, when aggregating data from sources with incomparable labels, computations are forced to spawn a thread with a sufficiently high label. To illustrate this point, we present the following example.

Example 3

Alice knows that there is a third-party API which provides financial planning and she would gladly incorporate its functionality into her API. However, Alice wants to keep the third-party code isolated from hers, while still providing functionality to the user. To do so, she incorporates a new label into the system, namely and modifies the lattice as shown in Fig. 11. The lattice reflects the mistrust that Alice has over the third-party code by making and incomparable elements.

Alice’s API is extended with the third-party code as follows.

figure e

Function \( reportPlan \) needs to fork a thread in order to unlabel the third-party code (\( financialPlan \)).

figure f

In the next section, we show how to avoid forking threads for this kind of scenarios.

4 Applicative Operator and Relabeling

To aggregate sensitivity-heterogeneous data without forking, we further extend the API with the primitives shown in Fig. 12. Primitive \( relabel \) copies, and possibly upgrades, a labeled value. This primitive is useful to “lift” data to an upper bound of all the data involved in a computation prior to combining them. Operator \((\langle * \rangle )\) supports function application within a labeled value, i.e. it allows to feed functions wrapped in a labeled value (\( Labeled \;\ell \;( a \rightarrow b )\) with arguments also wrapped (\( Labeled \;\ell \; a \)), where aggregated results get wrapped as well (\( Labeled \;\ell \; b \)). We demonstrate the utility of \( relabel \) and \((\langle * \rangle )\) by rewriting Example 3.

Fig. 12.
figure 12

Extended API for labeled values

Example 4

Alice easily modifies \( reportPlan \) as follows:

figure g

The third-party function (\( financialPlan \)) is relabeled to , which is justified since , and then applied to the user data (\( financialPlan' \;\langle * \rangle \; user \)) using the applicative (functor) operator. Note that the result is still labeled with .

Discussion. In function programming, operator \((\langle * \rangle )\) is part of the applicative functors [20] interface, which in combinations with \( fmap \), is used to map functions over functors. Note that if labeled values fully enjoyed the applicative functor structure, our API would include also the primitive \( pure \mathbin {::} a \rightarrow Labeled \;\ell \; a \). This primitive brings arbitrary values into labeled values, which might break the security principles enforced by MAC. Instead of \( pure \), MAC centralizes the creation of labeled values in the primitive \( label \). Observe that, by using \( pure \), a programmer could write a computation where the created labeled information is sensitive rather than public. We argue that this situation ignores the no-write down principle, which might bring confusion among developers. More importantly, freely creating labeled values is not compatible with the security notion of cleareance, where secure computations have an upper bound on the kind of sensitive data the they can observe and generate. This notion becomes useful to address certain covert channels [40] as well as poison-pill attacks [13]. While MAC does not yet currently support cleareance, we state this research direction as future work.

5 Security Guarantees

This section presents the core part of our formalization of MAC as a simply typed call-by-name \(\lambda \)-calculus extended with booleans, unit values, and monadic operations. Note that our mechanized proofs, available onlineFootnote 2, cover the full calculus which also includes references, synchronization variables, and exceptions. Given the number of advanced features in the calculus we remark that a proof assistant has proved to be an invaluable tool to verify the correctness of our proofs. Figure 13 shows the formal syntax. Meta variables \(\tau \), \( v \) and \( t \) denote types, values, and terms, respectively. Most of these syntactic categories are self-explanatory with the exception of a few cases that we proceed to clarify. We note that, even though labels are actual types in MAC, we use a separate syntactic category \(\ell \) for clarity in this calculus. Furthermore, we assume that labels form a lattice \((\mathscr {L},\sqsubseteq )\). Constructors \( MAC \) and \( Res \) represent a secure computation and a labeled resource, respectively. The latter is an established technique to lift arbitrary resources such as references and synchronization variables into MAC [31]. \( MAC \) and \( Res \) are MAC’s internals constructors, therefore they are not available to users of the library and are not part of the surface syntax. Data type \( Id \;\tau \) denotes an expression of type \(\tau \) and \( Res \;( Id \; t )\) represents a labeled expression \( t \), which we abbreviate as \( Labeled \; t \). Similarly we write \( Labeled \;\ell \;\tau \) for the type \( Res \;\ell \;( Id \;\tau )\). Node \(\langle * \rangle \) corresponds to the applicative (functor) operator and is overloaded for \( Labeled \;\ell \; t \) and \( Id \;\tau \). Every applicative functor is also a functor [20], hence \( fmap \; f \; x \) is simply defined as \(( Labeled \; f )\;\langle * \rangle \; x \). The special syntax nodes \(\bullet \), \(\langle * \rangle _{\bullet }\), and \(relabel_{\bullet }\) represent erased terms and are used by our proof technique to examine the security guarantees of the calculus.

Fig. 13.
figure 13

Formal syntax for types, values, and terms.

Types. The typing judgment \(\Gamma \vdash t \mathbin {:}\tau \) denotes that term \( t \) has type \(\tau \) assuming the typing environment \(\Gamma \). All the typing rules are standard and thus omitted, except for \(\bullet \) which can assume any type, i.e. \(\Gamma \vdash \bullet \mathbin {:}\tau \).

Fig. 14.
figure 14

Semantics for non-standard constructs.

Semantics. The small-step semantics of the calculus is represented by the relation \( t _{1}\leadsto t _{2}\), which denotes that \( t _{1}\) reduces to \( t _{2}\) in one step. Most of the rules are standard and hence omitted; the rules for interesting constructs are shown in Fig. 14. Term \(\bullet \) merely reduces to itself according to rule [Hole.] Rule \([\textsc {Labeled}\langle * \rangle ]\) describes the semantics of operator \(\langle * \rangle \), which applies a labeled function to a labeled value. Terms \( t _{1}\) and \( t _{2}\) are wrapped in \( Id \) so they cannot be combined by plain function application. As rule \([\textsc {Id}\langle * \rangle ]\) shows, \( Id \) is also an applicative operator and therefore \(\langle * \rangle \) is used instead. Observe that symbol \(\langle * \rangle \) is overloaded, where the type of its argument determines which rule to apply, i.e. either \([\textsc {Labeled}\langle * \rangle ]\) or \([\textsc {Id} \langle * \rangle ]\). Rule \([\textsc {Id} \langle * \rangle ]\) requires a function to be in weak-head normal form (\((\lambda x . t _{1})\; t _{2}\)) where beta reduction occurs right away. (As usual, we write \([ t _{1}\mathbin {/} x ]\; t _{2}\) for the capture-avoiding substitution of every occurrence of \( x \) with \( t _{1}\) in \( t _{2}\)). This manner to write the rule is unusual since it would be expected that \( Id \; f \;\langle * \rangle \; Id \; t \leadsto Id \;( f \; t )\). Nevertheless, the eagerness of \(\langle * \rangle \) in its first argument is needed for technical reasons in order to guarantee non-interference. Rule [Relabelu]pgrades the label of a labeled value. Since relabeling occurs at the level of types, the reduction rules simply create another labeled term. Finally rule [unlabele]xtracts the labeled value and returns it in a computation at the appropriate security level. We omit the two context rules that first reduce the labeled value to weak-head normal form and then the expression itself.

5.1 Sequential Calculus

Fig. 15.
figure 15

Commutative diagram

In this section, we prove progress-insensitive non-interference for our calculus. Similar to other work [19, 30, 37], we employ the term erasure proof technique. To that end, we introduce an erasure function which rewrites sensitive information, i.e. data above the security level of the attacker, to term \(\bullet \). Since security levels are at the type-level, the erasure function is type-driven. We write \(\varepsilon _{\ell _{ A }}^{\tau }( t )\) for the erasure of term \( t \) with type \(\tau \) of data above the security of the attacker \(\ell _{ A }\). We omit the type superscript when it is either irrelevant or clear from the context. Figure 15 highlights the intuition behind the used proof technique: showing that the drawn diagram commutes. More precisely, we show that erasing sensitive data from a term \( t \) and then taking a step (lower part of the diagram) is the same as firstly taking a step (upper part of the diagram) and then erasing sensitive data. If term \( t \) leaks data which sensitivity label is above \(\ell _{ A }\), then erasing all sensitive data and taking a step might not be the same as taking a step and then erasing secret values—the leaked sensitive data in \( t '\) might remain in \(\varepsilon _{\ell _{ A }}^{\tau }( t ')\).

Fig. 16.
figure 16

Erasure function.

Fig. 17.
figure 17

Reduction rules for \(\langle * \rangle _{\bullet }\) and \(relabel_{\bullet }\).

Figure 16 shows the definition of the erasure functions for the interesting cases. Before explaining them, we remark that ground values (e.g., \( True \)) are unaffected by the erasure function and that, for most of the terms, the function is homomorphically applied, e.g., \(\varepsilon _{\ell }^{()}(\mathbf {if}\ t _{1}\ \mathbf {then}\ t _{2}\ \mathbf {else}\ t _{3})\) = \(\mathbf {if}\ \varepsilon _{\ell }^{ Bool }( t _{1})\ \mathbf {then}\ \varepsilon _{\ell }^{()}( t _{2})\ \mathbf {else}\ \varepsilon _{\ell }^{()}( t _{3})\). Labeled resources are erased according to the label found in their type (\( Res \;\ell \;\tau \)). If the attacker can observe the term (\(\ell \;\sqsubseteq \;\ell _{ A }\)), the erasure function is homomorphically applied; otherwise, it is replaced with \(\bullet \). In principle, one might be tempted to apply the erasure function homomorphically for \(\langle * \rangle \) and \( relabel \), but such approach unfortunately breaks the commutativity of Fig. 15. To illustrate this point, consider the term \(( Res \; f )\;\langle * \rangle \;( Res \; x )\) of type , which reduces to \( Res \;( f \;\langle * \rangle \; x )\) according to rule \([\textsc {Labeled} \langle * \rangle ]\). By applying the erasure function homomorphically, we get , that is \(( Res \;\bullet )\;\langle * \rangle \;( Res \;\bullet )\) which reduces to \( Res \;(\bullet \;\langle * \rangle \;\bullet )\not \equiv Res \;\bullet \). Operator \( relabel \) raises a similar problem. Consider for example the term , where . If the erasure function were applied homomorphically, i.e. consider , it means that sensitive data produced by \( relabel \) remains after erasure—thus, breaking commutativity. Instead, we perform erasure in two-steps—a novel technique if compared with previous papers (e.g., [37]). Rather than being a pure syntactic procedure, erasure is also performed by additional evaluation rules, triggered by special constructs introduced by the erasure function. Specifically, the erasure function replaces \(\langle * \rangle \) with \(\langle * \rangle _{\bullet }\) and erasure is then performed by means of rule \([\textsc {Labeled} \langle * \rangle {}_\bullet ]\)—see Fig. 17. Following the same scheme, the erasure function replaces \( relabel \) with \(relabel_{\bullet }\) and rule \([\textsc {Relabel}{}_\bullet ]\) performs the erasure. \(\langle * \rangle _{\bullet }\) and \(relabel_{\bullet }\) and their semantics rules are introduced due to mere technical reasons (as explained above) and they do not impact the performance of MAC since they are not part of its implementation. Finally, terms of type \( MAC \;\ell \;\tau \) are replaced by \(\bullet \) when the computation is more sensitive than the attacker level (\(\ell \;\not \sqsubseteq \;\ell _{ A }\)); otherwise, the erasure function is homomorphically applied.

Progress-Insensitive Non-interference. The non-interference proof relies on two fundamental properties of our calclulus: determinism and distributivity.

Proposition 1

(Sequential determinancy and distributivity)

  • If \( t _{1}\leadsto t _{2}\) and \( t _{1}\leadsto t _{3}\) then \( t _{2}\mathrel {=} t _{3}\).

  • If \( t _{1}\leadsto t _{2}\) then \(\varepsilon _{\ell _{ A }}( t _{1})\leadsto \varepsilon _{\ell _{ A }}( t _{2})\).

In Proposition 1, we show the auxiliary property that erasure distributes over substitution, i.e. \(\varepsilon _{\ell _{ A }}([ x \mathbin {/} t _{1}]\; t _{2})\mathrel {=}[ x \mathbin {/}\varepsilon _{\ell _{ A }}( t _{1})]\;\varepsilon _{\ell _{ A }}( t _{2})\). Note, however, that the erasure function does not always distribute over function application, i.e. \(\varepsilon _{\ell _{ A }}^{\tau }( t _{1}\; t _{2})\not \equiv \varepsilon _{\ell _{ A }}( t _{1})\;\varepsilon _{\ell _{ A }}( t _{2})\) when \(\tau \mathrel {=} MAC \; h \;\tau '\) and \( h \;\not \sqsubseteq \;\ell _{ A }\). It is precisely for this reason that rule \([\textsc {Id} \langle * \rangle ]\) performs substitution rather than function application. Before stating non-interference, we formally define \(\ell _{ A }\)-equivalence.

Definition 1

(\(\ell _{ A }\)-equivalence). Two terms are indistinguishable from an attacker at security level \(\ell _{ A }\), written \( x \approx _{\ell _{ A }} y \), if and only if \(\varepsilon _{\ell _{ A }}( x )\mathrel {=}\varepsilon _{\ell _{ A }}( y )\).

Using Proposition 1, we show that our semantics preserves \(\ell _{ A }\)-equivalence.

Proposition 2

(\(\ell _{ A }\) -equivalence preservation). If \( t _{1}\approx _{\ell _{ A }} t _{2}\), \( t _{1}\leadsto t _{1}'\), and \( t _{2}\leadsto t _{2}'\), then \( t _{1}'\approx _{\ell _{ A }} t _{2}'\).

We finally prove progress-insensitive non-interference for the sequential calculus. We employ big-step semantics, denoted by \( t \;\Downarrow \; v \), which reduces term \( t \) to value \( v \) in a finite number of steps.

Theorem 1

(PINI). If \( t _{1}\approx _{\ell _{ A }} t _{2}\), \( t _{1}\;\Downarrow \; v _{1}'\), and \( t _{2}\;\Downarrow \; v _{2}'\), then \( v _{1}'\approx _{\ell _{ A }} v _{2}'\).

Fig. 18.
figure 18

Syntax for concurrent calclulus.

5.2 Concurrent Calculus

Figure 18 extends the calculus from Sect. 5 with concurrency. It introduces global configurations of the form \(\langle s,\varPhi \rangle \) composed by an abstract scheduler state \(s\) and a thread pool \(\varPhi \). Threads are secure computations of type \( MAC \;\ell \;()\) which get organized in isolated thread pools according to their security label. A pool \(t_s\) in the category \( Pool \;\ell \) contains exclusively threads at security level \(\ell \). We use the standard list interface \([]\), \( t \mathbin {:}t_s\), and \(t_s{[}n{]}\) for the empty list, the insertion of a term into an existing list, and accessing the nth-element, respectively. We write \(\varPhi {[}\ell {]}{[}n{]}\mathrel {=} t \) to retrieve the nth-thread in the \(\ell \)-thread pool—it is a syntax sugar for \(\varPhi (\ell )\mathrel {=}t_s\) and \(t_s{[}n{]}\mathrel {=} t \). The notation \(\varPhi {[}\ell {]}{[}n{]}\mathbin {:=} t \) denotes the thread pool obtained by performing the update \(\varPhi (\ell ) [ n \mapsto t ]\). Reading from an erased thread pool results in an erased thread, i.e. \(\bullet {[}n{]}\mathrel {=}\bullet \) and updating it has no effect, i.e. \(\bullet [ n \mapsto t ]\) \(\mathrel {=}\bullet \).

Fig. 19.
figure 19

Scheme rule for concurrent semantics.

Semantics. The relation \(\hookrightarrow _{(\ell , n )}\) represents an evaluation step for global configurations, where the thread identified by \((\ell , n )\) gets scheduled. Figure 19 shows the scheme rule for \(\hookrightarrow _{(\ell , n )}\). The scheduled thread is retrieved from the configuration (\(\varPhi {[}\ell {]}{[}n{]}\mathrel {=} t _{1}\)) and executed (\( t _{1}\;\leadsto _e\; t _{2}\)). We decorate the sequential semantics with events \( e \), which provides to the scheduler information about the effects produced by the scheduled instruction, for example \(\bullet \) \(\leadsto _{\bullet }\) \(\bullet \). Events inform the scheduler about the evolution of the global configuration, so that it can realize concrete scheduling policies. The relation \( s_1\) \(\xrightarrow {(\ell , n , e )}\) \( s_2\) represents a transition in the scheduler, that depending on the initial state \( s_1\), decides to run thread identified by \((\ell , n )\) and updates its state according to the event \( e \). Lastly, the thread pool is updated with the final state of the thread (\(\varPhi {[}\ell {]}{[}n{]}\mathbin {:=} t _{2}\)).

Progress-Sensitive Non-interference. Our concurrent calculus satisfies progress sensitive non-interference—a security condition often enforced by IFC techniques for \(\pi \)-calculus [12, 27]. A global configurations is erased by erasing its components, that is \(\varepsilon _{\ell _{ A }}(\langle s,\varPhi \rangle )\mathrel {=}\langle \varepsilon _{\ell _{ A }}( s),\varepsilon _{\ell _{ A }}(\varPhi )\rangle \). The thread pool \(\varPhi \) is erased point-wise, pools are either completely collapsed if not visible from the attacker, i.e. \(\varepsilon _{\ell _{ A }}^{ Pool \;\ell }(t_s)\mathrel {=}\bullet \) if \(\ell \;\not \sqsubseteq \;\ell _{ A }\), or the erasure function is homomorphically applied to their content. The erasure of the scheduler state \( s\) is scheduler specific. To obtain a parametric proof of non-interference, we assume certain properties about the scheduler. Specifically, our proof is valid for deterministic schedulers which fulfill progress and non-interference themselves, i.e. schedulers cannot leverage sensitive information in threads to determine what to schedule next As for the sequential calculus, we rely on determinancy and distributivity of the concurrent semantics.

Proposition 3

(Concurrent determinancy and distributivity)

  • If \( c _{1}\) \(\hookrightarrow _{(\ell , n )}\) \( c _{2}\) and \( c _{1}\) \(\hookrightarrow _{(\ell , n )}\) \( c _{3}\), then \( c _{2}\mathrel {=} c _{3}\).

  • If \( c _{1}\) \(\hookrightarrow _{(\ell , n , e )}\) \( c _{2}\), then it holds that \(\varepsilon _{\ell _{ A }}( c _{1})\) \(\hookrightarrow _{(\ell , n ,\varepsilon _{\ell _{ A }}( e ))}\) \(\varepsilon _{\ell _{ A }}( c _{2})\).

In the non-interference theorem, we write as usual \(\hookrightarrow ^\star \)for the reflexive transitive closure of \(\hookrightarrow \) and we generalize \(\approx _{\ell _{ A }}\) to denote \(\ell _{ A }\)-equivalence between configurations.

Theorem 2

(Progress-sensitive non-interference). Given the global configurations \( c _{1}\), \( c _{1}'\), \( c _{2}\), and assuming a deterministic and non-interfering scheduler that makes progress, if \( c _{1}\approx _{\ell _{ A }} c _{2}\) and \( c _{1}\) \(\hookrightarrow _{(\ell , n )}\) \( c _{1}'\), then there exists \( c _{2}'\) such that \( c _{2}\) \(\hookrightarrow ^\star \) \( c _{2}'\) and \( c _{2}\approx _{\ell _{ A }} c _{2}'\).

6 Related Work

Security Libraries. Li and Zdancewic’s seminal work [18] shows how the structure arrows can provide IFC as a library in Haskell. Tsai et al. [39] extend that work to support concurrency and data with heterogeneous labels. Russo et al. [30] implement the security library SecLib using a simpler structure than arrows, i.e. monads—rather than labeled values, this work introduces a monad which statically label side-effect free values. The security library LIO [36, 37] dynamically enforces IFC for both sequential and concurrent settings. LIO presents operations similar to \( fmap \) and \(\langle * \rangle \) for labeled values with differences in the returning type due to LIO’s checks for clearence—this work provides a foundation to analyze the security implications of such primitives. Mechanized proofs for LIO are given only for its core sequential calculus [37]. Inspired by SecLib and LIO’s designs, MAC leverages Haskell’s type system to enforce IFC [31]—this work does not contain formal guarantees and relies on its simplicity to convince the reader about its correctness. HLIO uses advanced Haskell’s type-system features to provide a hybrid approach: IFC is statically enforce while allowing the programmers to defer selected security checks to be done at runtime [6]. Our work studies the security implications of extending LIO, MAC, and HLIO with a rich structure for labeled values. Devriese and Piessens provide a monad transformer to extend imperative-like APIs with support for IFC in Haskell [8]. Jaskelioff and Russo implements a library which dynamically enforces IFC using secure multi-execution (SME) [15]—a technique that runs programs multiple times. Rather than running multiple copies of a program, Schmitz et al. [33] provide a library with faceted values, where values present different behavior according to the privilege of the observer. Different from the work above, we present a fully-fledged mechanized proof for our sequential and concurrent calculus which includes references, synchronization variables, and exceptions.

IFC tools. IFC research has produced compilers capable of preserving confidentiality of data: Jif [25] and Paragon [4] (based on Java), and FlowCaml [34] (based on Caml). The SPARK language presents a IFC analysis which has been extended to guarantee progress-sensitive non-inference [28]. JSFlow [11] is one of the state-of-the-art IFC system for the web (based on JavaScript). These tools preserve confidentiality in a fine-grained fashion where every piece of data is explicitly label. Specifically, there is no abstract data type to label data, so our results cannot directly apply to them.

Operating systems research. MAC systems [3] assign a label with an entire OS process—settling a single policy for all the data handled by it. While proposed in the 70s, there are modern manifestations of this idea (e.g., [17, 23, 40]) applied to diverse scenarios like the web (e.g., [2, 38]) and mobile devices (e.g., [5, 16]). In principle, it would be possible to extend such MAC-like systems to include a notion of labeled values with the functor structure as well as the relabeling primitive proposed by this work. For instance, COWL [38] presents the notion of labeled blob and labeled XHR which is isomorphic to the notion of labeled values, thus making possible to apply our results. Furthermore, because many MAC-like system often ignore termination leaks (e.g., [9, 40]), there is no need to use call-by-name evaluation to obtain security guarantees.

7 Conclusions

We present an extension of MAC that provides labeled values with an applicative functor-like structure and a relabeling operation, enabling convenient and expressive manipulation of labeled values using side effect-free code and saving programmers from introducing unnecessary sub-computations (e.g., in the form of threads). We have proved this extension secure both in sequential and concurrent settings, exposing an interesting connection between evaluation strategy and progress-sensitive non-interference. This work bridges the gap between existing IFC libraries (which focus on side-effecting code) and the usual Haskell programming model (which favors pure code), with a view to making IFC in Haskell more practical.