Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

While writing a concurrent program, a programmer often prefers to have non-interfered access to shared data that is manipulated by a thread, since this permits the reasoning about the correctness of the code to be done locally and therefore simplifies the process. Atomicity is a generic correctness criterion that is inspired by this view. Informally, an atomic code block has the same behaviour under interfering actions of other threads as it does when executed without interference (serially). Establishing atomicity of code blocks eases the task of reasoning about the program by substantially reducing the number of interleavings that need to be considered. Moreover, non-atomicity hints at the existence of potential bugs; a study of concurrency errors [20] shows that a majority of reported errors in concurrent programs (around \(69\,\%\)) are atomicity violations.

Several notions of atomicity have been introduced in the literature. A widely recognized notion is conflict serializability [21], introduced as a correctness criterion with a tractable monitoring algorithm that guarantees atomicity. It is assumed that a program’s code is divided into code blocks (such as procedures, loop bodies, or even single statements) that are called transactions. An execution is conflict serializable if it is equivalent to a serial execution, i.e. an execution in which all transactions are executed in a sequential non-interleaved fashion. The key element of this definition is the notion of equivalence which allows permutation of non-conflicting statements to establish an equivalent serial execution.

There has been a huge body of research in the recent years that studies the problems of static and dynamic checking of atomicity, which is almost entirely based on the assumption that the programs are executed under a sequentially consistent (SC) memory model. Weak memory models have been duly getting a lot of attention in the programming languages and systems research communities, and yet the question of atomicity under a weak memory model has not been studied well. Let us start by an example to motivate why weak memory models require a carefully tailored notion of atomicity.

Fig. 1.
figure 1

Task pool (transactions marked by brackets).

Consider the program with two methods in Fig. 1. Array pool implements a pool of tasks with two pointers head and tail pointing to its beginning and end. The invariant is head \(\mathtt{\le }\) tail and the pool is empty if head = tail. The procedures (a) and (b) take elements from the pool’s head and tail, respectively. Imagine a program that is running these two procedures in two threads (transactions are marked by brackets in the figure). Once a thread atomically modifies head/tail, interference from the other thread is tolerated. But, when it is about to modify the pool, it requires mutual exclusion. It is easy to verify that every execution of this program is conflict serializable (under sequential consistency). Even though both (a) and (b) potentially write to an element of the array pool, the conditional ensures that it is never the same element. Now consider the same program executed under the Total Store Order (TSO) memory model where writes are first stored in a thread-local buffer and non-deterministically flushed into the shared memory at a later time. When head + 1 = tail, the if condition may succeed in both (a) and (b). A write to head performed by (a) may be propagated to (b) after the condition is tested in (b), conversely a write to tail performed by (b) may be propagated to (a) after the condition is tested in (a). This is the behaviour that is strictly disallowed under SC. In that case, both threads access the same element of the array pool by first reading it and then writing to it. This is a classic violation of atomicity. Moreover, assuming that the threads are grabbing tasks from this task pool to execute, this non-atomic behaviour can lead to a real program error if a non-idempotent task ends up being executed twice by two different threads. We need a notion of atomicity that is aware of such erroneous TSO-executions, and declares them as non-atomic.

In this paper, we propose a new notion of atomicity, called TSO-serializability, which is inspired by the standard notion of conflict serializability under SC, in the sense that is syntactic, efficient to monitor, and helpful for the programmer to facilitate local reasoning. Yet, it makes special considerations for (i) non-atomicity of writes to the shared memory under TSO, and (ii) possible reorderings of shared memory accesses made by the same thread, allowed under TSO but not under SC. The idea is that TSO-serializability lifts the relaxedness of the orderings of individual statements under TSO to the level of atomic blocks (viewed as composite statements). For example, since TSO allows for two statements \( write (x) read (y)\) to be reordered to \( read (y)\) \( write (x)\) (indicating that the write is committed later), therefore we expect the two-transaction sequence \([ write (x_1) write (x_2)]\) \([ read (y_1) read (y_2)]\) to be allowed to be reordered to \([ read (y_1) read (y_2)]\) \([ write (x_1) write (x_2)]\) in an equivalent execution.

We provide a formal justification for the notion of TSO-serializability presented in this paper by stating its precise relation to SC-serializability and robustness. Robustness [6] is a property of a program stating that the program does not exhibit non-SC behaviour if executed on a weaker memory model such as TSO. If a program is robust, and it is SC-serializable, then for any reasonable notion of TSO-serializability, one should expect it to be serializable under TSO. That is exactly what we prove for our proposed notion of TSO-serializability. The converse, however, does not always hold. If a program exhibits strictly more behaviours under TSO (compared to SC), it is expected that some of these behaviours may not serializable, while all SC behaviours are.

Since TSO-serializability is formulated based on the concept of a syntactic conflict relation (similar to standard SC-serializability), a monitoring algorithm for TSO-serializability can be adapted from the classic algorithm for conflict serializability effortlessly; a program execution can be monitored for TSO-serializability violations using a similar algorithm as SC-serializability [21] and in the same polynomial time complexity. There is, however, a practical impediment in the way of monitoring programs for TSO-serializability violations, and that is how to obtain an execution to monitor in the first place. To obtain a detailed TSO execution (including the information about when writes were committed to memory), the monitor needs access to inner workings of the cache coherence protocol. This implies a very complicated monitor design which will likely have huge performance setbacks. Conceptually, there is a lightly distributed system that needs to be monitored, and observing global snapshots of which are costly.

We propose the notion of traces, as an abstraction of executions (in the form of a set of executions) which forgets information about the exact time of write commits. In a trace, once a write is issued by a thread, it can be committed at any point in the future, consistently with all the other accesses in the trace. We pose and solve the problem of monitoring a trace for TSO-serializability violations. Since a trace represents a set of executions, it is expected that this problem should be more complex than the monitoring problem of a single execution. We prove that the problem is in general NP-complete, but fixed-parameter tractable. We propose an algorithm to solve it in polynomial time if the number of threads in the program is considered to be a constant.

2 Multithreaded Programs and Their Executions

Events. A program consists of a number of threads running concurrently and communicating through shared variables. Each thread runs a sequence of transactions, which are themselves sequences of events. We fix arbitrary sets \(\mathbb {T}\), \(\mathbb {T}r\), \(\mathbb {V}\), and \(\mathbb {D}\) of thread identifiers, transaction identifiers, variable names, and values.

For a given thread identifier t, we fix the sets \(\mathbb {R}_t=\{ rd _{t}(x,v)_{i}: i\in \mathbb {T}r, x\in \mathbb {V}, v\in \mathbb {D}\}\) and \(\mathbb {W}_t=\{ wr _{t}(x,v)_{i}: i\in \mathbb {T}r, x\in \mathbb {V}, v\in \mathbb {D}\}\) of read and write events. Events are indexed by thread and transaction identifiers. Fence events (which concern the internal workings of TSO and which are explained later in this section) are denoted by \( fn _{t}^{i}\). We omit the transaction identifier i when it is understood from the context, or it is irrelevant. Let \(\mathbb {E}_t=\mathbb {R}_t\cup \mathbb {W}_t\cup \{ fn _{t}^{i}: i\in \mathbb {T}r\}\) and \(\mathbb {E}=\bigcup _t \mathbb {E}_t\).

Programs. A sequence of events \(\sigma \) is called serial when every two events of the same transaction are not separated by an event of another transaction, and well-formed when each transaction identifier is used at most once and for each thread t, the projection of \(\sigma \) on events of thread t is serial. A program P is abstractly represented as a prefix-closed set of well-formed sequences of events (representing all possible interleavings of events of different threads). The semantics of a program P for a specific memory model consists only of those sequences that are feasible under that memory model.

Memory Models. An SC-execution is a sequence of events \(\eta \in \mathbb {E}^*\) where roughly, each read event reads the value written by the last preceding write. An SC-execution of a program P is an SC-execution \(\eta \) such that \(\eta \in P\).

Under TSO, a write \( wr _{t}(x,v)_{i}\) (called also a write-issue) is first stored in a thread-local FIFO buffer, called the store buffer, before being non-deterministically flushed into the shared memory. The written value may become visible to other threads at a later time. Flushing the store buffers introduces additional events \( wr\text {-}com _{t}(x,v)_{i}\), called write-commit events, for removing a write \( wr _{t}(x,v)_{i}\) from the store buffer of t and execute it on the shared memory. We say that the write-commit \( wr\text {-}com _{t}(x,v)_{i}\) corresponds to that write, and denote it by \( wr _{t}(x,v)_{i}\sim wr\text {-}com _{t}(x,v)_{i}\). Write-commits inherit the transaction identifier of the corresponding write-issue (regardless of when they occur). A read \( rd _{t}(x,v)\) prefetches the value v written by the last write to x in the buffer of t, and if no such write exists, the value v is retrieved from the shared memory. A fence event \( fn _{t}\) is enabled only when the buffer of t is empty. Let \(\mathbb {W}c_t =\{ wr\text {-}com _{t}(x,v)_{i}: i\in \mathbb {T}r,x\in \mathbb {V}, v\in \mathbb {D}\}\), \(\mathbb {E}^{tso}_t=\mathbb {E}_t\cup \mathbb {W}c_t\), and \(\mathbb {E}^{tso}=\bigcup _t \mathbb {E}^{tso}{_t}\). For any \(e\in \{ rd _{t}(x,v)_{i}, wr _{t}(x,v)_{i}, wr\text {-}com _{t}(x,v)_{i}\}\), \(th(e)=t\) and \(var(e)=x\). A sequence of events \(\eta \in (\mathbb {E}^{tso})^*\) satisfying this semantics is called a TSO-execution. A TSO-execution of a program P is a TSO-execution \(\eta \) such that the projection of \(\eta \) on \(\mathbb {E}\) belongs to P. Figure 2(a) pictures a TSO-execution of the program in Fig. 1.

3 Conflict Serializability

Conflict serializability was introduced in [21] as a syntactic (and tractable to monitor) notion that ensures atomicity. Instead of considering the data manipulated by transactions, a conservative “conflict relation”, relating the individual actions of transactions, is defined which guarantees atomicity regardless of the data values read and written by individual actions. A conflict relation relates events with their values projected away, that we also call events (and inherit all the notations from Sect. 2 for sets of events). Conflict serializability is a property of a sequence of events (without values), which are also called SC/TSO-executions. Note that such a sequence represents a set of executions, where different values can be assigned to individual events (consistently).

Formally, a conflict relation is an irreflexive binary relation \(\circledcirc \subseteq \mathbb {E} \times \mathbb {E}\). For a pair of events \(e,e' \in \mathbb {E}\), we write \(e \circledcirc e'\) to stand for \((e,e') \in \ \circledcirc \) and \(e \not \circledcirc e'\) to stand for \((e,e') \not {\in } \circledcirc \). Intuitively, whenever \(e \circledcirc e'\), the effect of executing e after \(e'\) may differ from that of executing e before \(e'\). The conflict relation depends on the underlying memory model. For instance, the conflict relation \(\circledcirc _{ SC }\) from [10] assumes sequential consistency: \(e \circledcirc _{ SC }e'\) whenever e and \(e'\) are events of the same thread (i.e., \(th(e)=th(e')\)) or they access the same variable, and one of them is a write (i.e., \((e,e')\in (\mathbb {R}\cup \mathbb {W})^2 \setminus \mathbb {R}^2\) and \(var(e)=var(e')\)).

Given an execution \(\eta = \eta _1 e e' \eta _2\) (where e and \(e'\) are events and \(\eta _1\) and \(\eta _2\) are executions), we say an execution \(\eta ' = \eta _1 e' e \eta _2\) is derived from \(\eta \) by a \(\circledcirc \) -valid swap if and only if \(e \not \circledcirc e'\). A permutation \(\eta '\) of an execution \(\eta \) is \(\circledcirc \) -preserving if and only if \(\eta '\) can be derived from \(\eta \) through a sequence of \(\circledcirc \) -valid swaps.

An execution \(\eta \) is conflict serializable w.r.t. the conflict relation \(\circledcirc \) if and only if there exists an execution \(\eta '\) that is a \(\circledcirc \)-preserving serial permutation of \(\eta \). We call the notion of conflict serializability based on \(\circledcirc _{ SC }\) SC-serializabiliy for short. A program P is SC-serializable iff every SC-execution of P is SC-serializable.

An equivalent characterization of conflict serializability can be established through conflict graphs [21], where the graph was constructed for a specific conflict relation. The same definition can be easily adapted for any conflict relation.

Definition 1

(Event-Graph). The event-graph of an execution \(\eta \) is the directed graph \(EG_{\eta } = \left\langle {V,E} \right\rangle \) where there is a node in V for each event in \(\eta \), and E contains an edge from u to v iff \(e(u) \circledcirc e(v)\) and e(u) occurs before e(v) in \(\eta \) (where e(v) is the event of execution \(\eta \) corresponding to the graph node v).

Intuitively, one can think of the event-graph of an execution \(\eta \) as a structure that represents the order between all conflicting events in \(\eta \).

The conflict-graph of an execution \(\eta \) is defined based on the event-graph of \(\eta \) by grouping all events indexed by the same transaction identifier as a new node, and considering the directed graph that is induced on these new transaction nodes. Let \( tr (v)\) be the set of events that belong to a transaction node v.

Definition 2

(Conflict-Graph). The conflict-graph of an execution \(\eta \) is the directed graph \(CG_{\eta }=\left\langle {V',E'} \right\rangle \) where \(V'\) includes one node for each transaction identifier in \(\eta \), and we have \((v,v') \in E'\) iff there exists events \(e \in tr (v)\) and \(e' \in tr (v')\) such that \((e,e') \in E\) where \(EG_{\eta } = (V,E)\) is the event-graph of \(\eta \).

Theorem 1

(from [21]). For a conflict relation \(\circledcirc \), an execution \(\eta \) is conflict-serializable if and only if \(CG_{\eta }\) is acyclic.

In [21], a polynomial time algorithm is presented that uses the conflict graph and Theorem 1 to monitor an execution under SC for violations of serializability.

Event-graphs and conflict-graphs of SC-executions are defined as in Definitions 1 and 2, respectively, using \(\circledcirc _{ SC }\) instead of \(\circledcirc \).

4 Serializability Under TSO

In this section, we propose a conflict relation for TSO and justify the suitability of the obtained notion of conflict serializability by relating it to the classic SC serializability.

4.1 TSO Conflict Relation

The TSO conflict relation \(\circledcirc _{ TSO }\) is formally defined as follows:

figure a

Similar to the SC conflict relation, \(\circledcirc _{ TSO }\) declares events accessing the same shared memory location, where at least one of them is a write-commit conflicting (see (1) above). However, since under TSO, some read events may access values by reading from a local buffer (instead of the shared memory), there are exceptions to this general rule involving such reads.

figure b

A read \( rd _{t_1}(x)\) event that occurs between a \( wr _{t_1}(x)\) event and the corresponding \( wr\text {-}com _{t_1}(x)\) event, and where \( wr _{t_1}(x)\) is the most recent write-issue event before \( rd _{t_1}(x)\), fetches its value from the store buffer that holds the value written by \( wr _{t_1}(x)\). In this case, according to the TSO semantics, event \( rd _{t_1}(x)\) should not be in conflict with a write-commit \( wr\text {-}com _{t_2}(x)\) of another thread that happens in parallel with it; that is, when \( wr\text {-}com _{t_2}(x)\) occurs between the pair of events \( wr _{t_1}(x)\) and \( wr\text {-}com _{t_1}(x)\) (as illustrated in the figure on the right). Such a pair of parallel read and write-commit events, which we denote by \( rd _{t_1}(x)\,||\, wr\text {-}com _{t_2}(x)\), should not be conflicting since one is a read from a local store buffer and the other a write to the shared memory (accesses to two different resources).

Similar to the SC conflict relation, \(\circledcirc _{ TSO }\) declares events within the same thread to be in conflict (see (2) above). Again, there are exceptions to this rule. A write-commit is not in conflict with other read and write events in the same thread (see (ii) above), except for its corresponding write issue (which must always precede it). Other exceptions (see (iii) above) are related to the relaxations of the program order allowed by the TSO semantics. There is no conflict between a write and a read event of the same thread on different variables. This exception is natural since it extrapolates the behaviour of the memory model at the level of events to the level of transactions, i.e., write-only transactions can be reordered with respect to later read-only transactions. Finally, TSO semantics relaxes the program order between a \( rd _{t}(x)\) event that fetches its value from the store buffer and a future \( rd _{t}(y)\) event of a different variable \(y\ne x\) (see also [1]).

Buffered Reads. The relative ordering of \( wr _{t}(x)\)/\( wr\text {-}com _{t}(x)\) events corresponding to the read event \( rd _{t}(x)\) (of the same thread) determines whether the read fetches its value from the buffer (or the shared memory). Therefore, every read event \( rd _{t}(x)\), that is preceded by a write \( wr _{t}(x)\) of the same thread and no fence event \( fn _{t}\) in between, may or may not be fetching its value from the local buffer, depending on when the write gets committed to the memory. This runtime information is unavailable when a programmer is reasoning at the level of the source code. We choose to call any such read, that may fetch its value from the buffer, a buffered read and exclude the mutual conflicts between these reads and later reads to other variables from \(\circledcirc _{ TSO }\) (see (iii) above). This way, we feel that the definition of conflict relation stays true to its main purpose, i.e. defining a notion atomicity that is helpful to programmers reasoning about their code.

The following proposition formally states the fact that all order relaxations introduced in the definition of \(\circledcirc _{ TSO }\) are consistent with the TSO semantics:

Proposition 1

Any \(\circledcirc _{ TSO }\)-preserving permutation of a TSO-execution \(\eta \) is also a TSO-execution.

Fig. 2.
figure 2

(a) A TSO-execution \(\eta \) (events are ordered from top to bottom) and its \(\circledcirc _{ TSO }\) event-graph \(EG_{\eta }\). (b) Ignoring dashed edges, the write-contraction of \(EG_{\eta }\). Dashed edges represent conflicts added by \(\circledcirc _{ TSO-po }\). Ignoring dashed edges and redefining the highlighted edges to be undirected, the trace event-graph \(EG_{\tau }\) of \(\tau = trace (\eta )\). (c) The conflict-graph induced by \(EG_{\eta }\). (d) The conflict-graph induced by \(EG_{\tau }\)

The notion of conflict serializability based on \(\circledcirc _{ TSO }\) is called TSO-serializability. A program P is TSO-serializable iff every TSO-execution of P is TSO-serializable. Event/conflict-graphs of TSO-executions are defined as in Definitions 1 and 2, respectively, by replacing \(\circledcirc \) with \(\circledcirc _{ TSO }\). An equivalent of Theorem 1 then provides an efficient (poly-time) procedure to monitor an execution for TSO-serializability violations. Figure 2(a) illustrates the event-graph of a non TSO-serializable execution of the program in Fig. 1. The conflict-graph in Fig. 2(c) contains a cycle.

4.2 Connection to SC-Serializability

Beyond Proposition 1, we substantiate our definition of TSO-serializability by formally relating it to the widely accepted notion of SC-serializability. We show that SC-serializability implies TSO-serializability for robust programs. Intuitively, a program is robust if it does not exhibit non-SC behaviour; in other words, each of its TSO-executions is equivalent to another execution of the same program under SC. Under SC, every write-issue is immediately followed by the corresponding write-commit (i.e. no delay in propagating the write).

Let \(\circledcirc _{ TSO-po }\) be a strengthening of \(\circledcirc _{ TSO }\) in which the program order is maintained for all pairs of events in \(\mathbb {E}\) in the same thread. Formally, a TSO-execution \(\eta \) is SC-equivalent when there exists an execution \(\eta '\) that is a \(\circledcirc _{ TSO-po }\)-preserving permutation of \(\eta \) and every write-issue of \(\eta '\) is immediately followed by the corresponding write-commit. A program P is robust when every TSO-execution of P is SC-equivalent. One can check SC-equivalence by letting every pair of write-issue and corresponding write-commit events to form a transaction, and checking conflict serializability of the execution consisting of these transactions and all other events as single transactions (more details in [1]). The conflict graph defined this way is called a write-contraction. For instance, the TSO-execution in Fig. 2(a) is not SC-equivalent (since there is a cycle in Fig. 2(b)) which implies that the program in Fig. 1 is not robust.

Theorem 2

A program P is TSO-serializable if it is robust and SC-serializable.

figure c

The reverse of Theorem 2 doesn’t hold. For instance, both programs above are TSO-serializable although the program on the left is not robust and the program on the right is not SC-serializable. The program on the left is TSO-serializable since every event is a transaction and events in the same thread are not in conflict, and it is not robust since intuitively, both reads don’t see the value written by the other thread. The program on the right is TSO-serializable because the events in thread 1 are not in conflict while it is not SC-serializable since it admits only one execution where the events of thread 1 take place in between the two events of thread 2.

A program P is called transaction-fenced when for every \(\sigma \in P\), every transaction in \(\sigma \), i.e., every maximal sub-sequence of events indexed by the same transaction identifier, ends with a fenceFootnote 1. For transaction-fenced programs, the converse of Theorem 2 is true:

Theorem 3

A transaction-fenced program P is TSO-serializable iff it is robust and SC-serializable.

5 Trace TSO-Serializability

There are practical obstacles in the way of implementing a monitor that can observe a TSO-execution of a program. The monitor is subject to the same distributed nature of the memory as individual program threads, and tracking write-commits of threads requires a manipulation of the cache-coherence protocols running in the multi-core chip with potentially high performance overheads. We introduce a notion of serializability for TSO that does not require to be aware of the exact timing of write-commits. This notion applies to abstractions of TSO-executions called traces that forget write-commits, assuming that a write-commit can happen at any point in time after its corresponding write-issue (consistent with the TSO semantics). This effectively means that the serializability of a set of executions (namely those where the forgotten write-commits reappear at any of the consistent points) is monitored instead of a single execution.

The trace of an execution \(\eta \), denoted by \( trace (\eta )\), is the projection of \(\eta \) on \(\mathbb {E}\) (basically leaving out all write-commits). The set of executions \( Execs (\tau )\) represented by a trace \(\tau \) is the set of all TSO-executions \(\eta \) such that \( trace (\eta ) = \tau \).

Definition 3

(Trace TSO-Serializability). A trace \(\tau \) is TSO-serializable iff every execution in \( Execs (\tau )\) is TSO-serializable.

The most important property of \(\tau = trace (\eta )\) for some execution \(\eta \) is that \(\tau \) can soundly be used to check if \(\eta \) is not TSO-serializable.

Proposition 2

If execution \(\eta \) is not TSO-serializable then the trace \( trace (\eta )\) is not TSO-serializable.

We introduce a conflict relation \({\mathop {\circledcirc }\limits ^{_{-\!\!\rightharpoonup }}}_{ T SO}\) for traces and a characterization of serializability based on that conflict relation. Intuitively, \({\mathop {\circledcirc }\limits ^{_{-\!\!\rightharpoonup }}}_{ T SO}\) stands for the union of the conflict relations for all the individual executions of that trace, where a write event represents both the write-issue and the corresponding write-commit. The relation \({\mathop {\circledcirc }\limits ^{_{-\!\!\rightharpoonup }}}_{ T SO}\) over traces is the union of two disjoint relations \(\overrightarrow{\circledcirc }_{ T SO}\) and \(\overline{\circledcirc }_{ T SO}\). Given \(e,e'\in \mathbb {E}\),

figure d

The conflicts between events of the same thread are included in \(\overrightarrow{\circledcirc }_{ T SO}\) since the order between such events is fixed in all the executions of the trace. Two events of different threads are in conflict if they are so under the classic SC conflict relation, and they are related by \(\overrightarrow{\circledcirc }_{ T SO}\) iff they are separated by a fence (since the fence ensures they are ordered in the same way in all executions) or if they are a non-buffered read (reading from the shared memory) together with a write (since a read cannot see the value of a write that hasn’t been issued yet). Formally, e and \(e'\) are fence-separated, denoted by \( fence (e,e')\), when e occurs before \(e'\), e is an action of thread t, and \(\tau \) contains a fence \( fn _{t}\) between e and \(e'\). In contrast, \(\overline{\circledcirc }_{ T SO}\) relates events that are conflicting under \(\circledcirc _{ SC }\) but may appear in different orders in different executions of a trace, for example two write events (of the same variable) performed by two different threads. Recall that a write represents both the write-issue and the corresponding write-commit.

Similar to the case of executions, having a graph theoretic characterization of serializability for traces is useful for algorithm design. We define the event-graph of a trace \(\tau \) that contains a directed edge from event e to event \(e'\) iff \(e \overrightarrow{\circledcirc }_{ T SO}e'\) and an undirected edge between e and \(e'\) iff \(e \overline{\circledcirc }_{ T SO}e'\).

Definition 4

(Trace Event-Graph). The event-graph of a trace \(\tau \) is the graph \(EG_{\tau } = \left\langle {V,E,U} \right\rangle \) where there is a node in V for each event in \(\tau \), E is a set of directed edges (uv) such that e(u) occurs before e(v) in \(\tau \) and \(e(u)\overrightarrow{\circledcirc }_{ T SO}e(v)\), and U is a set of undirected edges \(\{u,v\}\) such that e(u) occurs before e(v) in \(\tau \) and \(e(u)\overline{\circledcirc }_{ T SO}e(v)\) (where e(v) is the event of \(\tau \) corresponding to the node v).

Formally, an orientation of a graph \(G = \left\langle {V,E,U} \right\rangle \) with a set E of directed edges and a set U of undirected edges is a directed graph \(\left\langle {V,E\cup E'} \right\rangle \) such that for every undirected edge \(\{u,v\}\in U\), \(E'\) contains (uv) or (vu). An orientation of \(EG_{\tau }\) is valid when the resulting directed graph is acyclic.

The next result relates valid orientations of the trace event-graph and write-contractions of the trace’s executions event-graphs. Recall that the write-contraction of an event-graph \(EG_{\eta }\) is the graph \(EG^c_{\eta }\) where every node representing a write event \( wr _{t}(x)\) is merged with the node representing the corresponding write-commit event \( wr\text {-}com _{t}(x)\) (note that a contracted edge disappears and does not turn into a self-loop).

Theorem 4

For an execution \(\eta \in Execs (\tau )\), the write-contraction of \(EG_{\eta }\) is a valid orientation of \(EG_{\tau }\). Conversely, every valid orientation of \(EG_{\tau }\) is the write-contracted event-graph \(EG_{\eta }\) for some \(\eta \in Execs (\tau )\).

This leads to an interesting observation: \(EG_{\tau }\) of a trace \(\tau \) can be viewed as the union of the write-contractions \(EG^c_{\eta }\) of all \(\eta \in Execs (\tau )\), so that when all \(EG^c_{\eta }\)s agree on the direction of an edge between two nodes, that edge appears as a directed edge in \(EG_{\tau }\) and when at least two \(EG^c_{\eta }\)s disagree on the direction of an edge between two nodes, that edge appears as an undirected edge in \(EG_{\tau }\).

Also, Theorem 4 leads us to the following characterization of trace TSO-serializability based on orientations of trace event-graphs.

Theorem 5

A trace \(\tau \) is TSO-serializable iff every acyclic orientation of \(EG_{\tau }\) induces an acyclic conflict-graph.

Alternatively, one can directly define the notion of a conflict graph for traces. The event graph of a trace \(EG_{\tau }\) induces a graph over the transactions in the same sense as the conflict graph of an execution.

Definition 5

(Trace Conflict-Graph). The conflict-graph of a trace \(\tau \) is the graph \(CG_{\tau }=\left\langle {V',E', U'} \right\rangle \) where \(V'\) includes one node for each transaction in \(\tau \), and we have \((v,v') \in E'\) iff there exists actions \(a \in tr (v)\) and \(a \in tr (v')\) such that \((a,a') \in E\) and we have \(\{v,v'\} \in U'\) iff there exists actions \(b \in tr (v)\) and \(b' \in tr (v')\) such that \(\{b,b'\} \in U\) where \(EG_{\tau } = (V, E, U)\) is the event-graph of \(\tau \).

For instance, the conflict-graph of the trace of the execution in Fig. 2(a) is given in Fig. 2(d). Serializability of a trace \(\tau \) can be stated as a combined property of its conflict-graph \(CG_{\tau }\) and its event-graph \(EG_{\tau }\).

Corollary 1

Trace \(\tau \) is not TSO serializable iff there exists a cycle c in \(CG_{\tau } = \left\langle {V',E', U'} \right\rangle \) such that if \(\{u_1, \dots u_m\} \subseteq U'\) participate in c and \(\{e_1, \dots , e_m\}\) are the same set of edges oriented in the direction of the cycle, then there exists a valid orientation \(\left\langle {V,E''} \right\rangle \) of the event-graph \(EG_{\tau } = \left\langle {V,E, U} \right\rangle \) with \(\{e_1, \dots , e_m\} \subseteq E''\).

6 Monitoring TSO-serializability of Traces

In this section, we discuss the algorithmic aspect of monitoring traces for violations of TSO-serializability. Remember that (Sect. 3) monitoring one execution for violation of TSO-serializability is poly-time checkable.

Given a trace \(\tau \), we want to check whether \(\tau \) is TSO-serializable. We start by demonstrating that the general problem is NP-complete, and then propose polynomial time algorithms for approximations of this check. Specifically, we show that (i) under the assumption that the number of threads is a constant, there exists a sound and complete polynomial time algorithm that reports violations of TSO-serializability in a trace \(\tau \), and (ii) if the program is transaction-fenced, then TSO-serializability can be checked in polynomial time.

6.1 NP-Completeness of Trace TSO-serializability Checking

Theorem 5 provides an equivalent characterization of trace TSO-serializability, namely that every acyclic orientation of the trace event-graph induces an acyclic conflict-graph. It turns out that this check is NP-complete. We demonstrate this by reducing the known NP-complete problem of checking for the existence of a hamiltonian path in a given graph G to this problem.

Theorem 6

For a trace \(\tau \), the problem of checking whether \(\tau \) is TSO-serializable is NP-complete.

6.2 Fixed-Parameter Tractability

The good news is that there exists an algorithm for monitoring a trace for TSO-serializability violations which is polynomial time if one assumes the number of threads to be a constant. Given a trace of length n with k participating threads, it is easy to devise an exponential algorithm that finds a TSO-serializability violation if one exists and operates in \(O(n^k)\) time. However, considering that usually n (the number of events) is very large, it is desirable to have an algorithm with a running time where the exponent k does not appear over n, but over some constant instead.

In this section, we propose an algorithm of complexity \(O(n + c^k)\), where c is a constant that depends on the number of shared variables in the program, k is the number of threads, and n is the length of the trace. The main observation that gives rise to such an algorithm is that there is a concise witness to violation of TSO-serializability, and it suffices to search for the existence of such a witness algorithmically. We start by defining this concise witness, which always exists if an arbitrary witness exists.

Given the event graph \(EG_{\tau }\) of a trace \(\tau \), checking serializability of \(\tau \) reduces to deciding if there is a valid orientation of \(EG_{\tau }\) that induces a cycle over the conflict graph \(CG_{\tau }\). We will observe that if a valid orientation of \(EG_{\tau }\) induces a cycle, then this orientation induces a simple cycle (to be defined) over \(CG_{\tau }\).

Naturally, if the directed edges of the conflict graph \(CG_{\tau }\) already form a cycle (which can be checked in polynomial time on the size of the graph), then there is nothing left to be done; we have found our TSO-serializability violation witness. Therefore, we assume that \(CG_{\tau }\) is acyclic if it is restricted to its directed edges; let us call this graph \(\overrightarrow{CG_{\tau }}\). Similarly, \(\overrightarrow{EG_{\tau }}\) refers to \(EG_{\tau }\) restricted to its directed edges. We use the notation \(a \prec _\tau b\) to denote that \(\overrightarrow{EG_{\tau }}\) contains a path from event a to event b. Similarly, for transactions \(tr_1\) and \(tr_2\), we use the notation \(tr_1 \prec _\tau tr_2\) iff \(\overrightarrow{CG_{\tau }}\) contains a path from \(tr_1\) to \(tr_2\). The relation \(\prec _\tau \) captures the ordering constraints between events/transactions that are imposed by the directed conflict edges. For an event a, we use \( tr (a)\) to refer to the transaction that encloses a.

Let us assume that we have a cycle \(c = tr_{1} tr_{2} \dots tr_{m}\) over the conflict graph \(CG_{\tau }\). For each pair of consecutive transactions \(tr_{i}\) and \(tr_{i+1}\), let event \(b_{i}\) be the source and event \(a_{i+1}\) be the destination of the conflict edge between \(tr_i\) and \(tr_{i+1}\) that participates in the cycle (rotating back from \(b_m\) to \(a_1\)).

We say that cycle c can be simplified if there exist two transactions tr and \(tr'\) on it where \(tr \prec _\tau tr'\) and the segment of the cycle between tr and \(tr'\) contains at least one undirected conflict edge. By taking this segment of the cycle between tr and \(tr'\) and replacing it with the directed path (i.e. a path formed entirely of directed conflict edges) in the conflict graph from tr to \(tr'\), we simplify the cycle; we know that such a path exists by the definition of \(tr \prec _\tau tr'\). Intuitively, during simplification we get rid of undirected edges and replace them by directed paths; note that undirected edges are soft constraints in a trace which reflect that the order between two events is undetermined.

Definition 6

A simple cycle is a cycle that cannot be further simplified.

Below, we state two properties of simple cycles that are very useful for reducing the search space of our algorithm.

Proposition 3

In every simple cycle \(c = tr_{1} tr_{2} \dots tr_{m} tr_{1}\) over the conflict graph \(CG_{\tau }\) of a trace \(\tau \) (equivalently \(c = a_1 b_1 a_2 b_2 \dots a_m b_m a_1\) if the cycle is referenced by its conflict edges instead of its nodes) satisfies the following properties: (i) There exists at least one index k such that \(a_k \not \prec _\tau b_k\). (ii) Every two transactions tr and \(tr'\) that appear on c with an undirected edge somewhere in the middle of them (i.e. on the segment between tr to \(tr'\)) cannot belong to any chain (i.e. directed path) of the graph. In other words, we have \(tr \not \prec _\tau tr'\).

Property (ii) from the proposition above is straightforward yet significant because it implies that any simple cycle over the conflict graph can be viewed as a cycle where undirected edges connect segments of chains (i.e. directed paths) in the graph together, never visiting the same chain twice. We make use of the notion of profiles introduced in [11] for this algorithm. The idea is to summarize all possible entry/exits into each chain of \(\overrightarrow{CG_{\tau }}\) (that may participate in a simple cycle) as a set of pairs (of events), and look for cycles involving those pairs only.

Consider an event a of the event graph \(EG_{\tau }\). Let

$$\begin{aligned} pair (a) = \{ b|\ b \in tr (a) \ \vee \ tr (a) \prec _\tau tr (b)) \} \end{aligned}$$

The idea is that once a witness cycle enters \( tr (a)\) through a conflict edge with destination a, some \(b \in pair (a)\) is the event from which the cycle can leave the chain (i.e. directed path) that contains \( tr (a)\) and \( tr (b)\). In other words, \(\{a\} \times pair (a)\) is the set of all possible path segments that start with a and can be part of a simple cycle witnessing a violation of TSO-serializability.

Moreover, Proposition 3(i) states that at least for one transaction in the cycle we have a pair of events (ab) of the same transaction where \(a \prec _\tau b\) but where a and b participate in the witness cycle, which is directed from b back to a. The algorithm presented in Fig. 3 starts by enumerating all such pairs of events that belong to a single transaction (outermost loop). It then proceeds to find the matching entry/exit events (i.e. \(a'\) and \(b'\)) for the witness cycle in the chain containing a and b (the next nested loop). Finally, the innermost loop enumerates all possible choices of profiles for the remaining chains (other than the one containing \(a,b,a'\) and \(b'\)), and then the innermost statement checks if these choices form a valid witness cycle together.

Fig. 3.
figure 3

Algorithm for searching for all simple cycle witnesses. The choices of events for \(a,b,a',b'\) is over read and write events only. The chains \(\pi _1, \dots \pi _m\) are by definition disjoint.

The algorithm in Fig. 3 uses a function \( profile \) that returns the set of all profiles for a given chain. A profile of a chain is a set of elements of the following three forms: (i) a single event (a), when the witness conflict cycle enters and exits a chain at the same single event a, (ii) a pair of events (ab) of some transaction \( tr \), where the witness cycle enters/exits a chain at two events of the same transaction \( tr \), and (iii) a pair of events (ab), where a witness cycle enters a transaction in event a, then follows a chain of transactions on a directed path in the conflict graph and exits the chain through an event b (i.e. \( tr (a) \prec _\tau tr (b)\)). The set of profiles of a chain can be computed using the algorithm in Fig. 4.

Fig. 4.
figure 4

Algorithm for computing the set of all profiles of a transaction chain \(\pi \).

Soundness and Completeness. Here, we formally argue that it suffices for the algorithm to search for simple cycle witness to violation of TSO-serializability. The important observation is that:

Proposition 4

If a trace \(\tau \) is not TSO-serializable, then there exists a simple cycle witnessing the violation of TSO-serializability.

It remains to argue that the algorithm, through the use of profiles, will definitely find a simple cycle violation of TSO-serializability if one exists.

Proposition 5

For every partitioning \(\varPi \) of \(\overrightarrow{CG_{\tau }}\) into a set of chains, and every simple cycle violation of TSO-serializability c, we have that c visits every chain in \(\varPi \) at most once.

It is important to note that the above statement is independent of the choice of partitioning of \(\overrightarrow{CG_{\tau }}\) into chains. It is straightforward to see that a cycle’s footprint in every chain can be captured through one of the three possibilities that we introduced for profiles. Finally, we conclude the soundness and completeness of the algorithm in Fig. 3:

Theorem 7

Algorithm in Fig. 3 discovers a violation of TSO serializability in trace \(\tau \) iff one exists.

Complexity Analysis. A key observation about \(\overrightarrow{EG_{\tau }}\) is that for any trace \(\tau \), if \(\overrightarrow{CG_{\tau }}\) is restricted to a single thread and global read and write events, then the size of the largest anti-chain of it is at most 2. In other words, in every thread, there are at most two events a and b such that \(a \not \prec _\tau b\) and \(b \not \prec _\tau a\). This is a direct implication of the definition of \(\circledcirc _{ TSO }\); the only events that are not ordered in each thread are \( wr _{}(y)\) and \( rd _{}(x)\) when \(x \not = y\) , and the events appear in that order in the trace. Any other event that can be independent of \( wr _{}(y)\) will have to be a read event of some other variable, say \( rd _{}(z)\) which is in conflict with \( rd _{}(x)\) and therefore ordered with respect to it (similar argument for events independent of \( rd _{}(x)\)). We will make use of the following well-known theorem about the width of a partial order:

Theorem 8

(Dilworth’s Theorem). For every partial order, there exists an anti-chain A, and a partition of the order into a family P of chains, such that \(|P| = |A|\) (which is referred to as the width of the partial order). Moreover, such an A is the largest anti-chain in the order.

Since \(\overrightarrow{EG_{\tau }}\) is acyclic, by Dilworth’s Theorem, we know that it can be partitioned into at most p (maximal) chains (i.e. directed paths) where p is the size of the largest anti-chain of \(\overrightarrow{EG_{\tau }}\). The size of the largest anti-chain of \(\overrightarrow{EG_{\tau }}\) restricted to each thread (and ignoring the buffered reads) is at most 2. If we assume that there are k threads in the program, this implies that \(\overrightarrow{EG_{\tau }}\) (ignoring the buffered reads) can be partitioned into 2k chains. If we have m shared variables in the program, then each such chain can be summarized as at most \((2\,m)^2\) possible profiles (i.e. all possible combinations of \(2\,m\) reads and \(2\,m\) writes).

Now, let us add consideration for the buffered reads. In each thread, all buffered reads of the same variable are conflicting and form a chain. Therefore, in the worst case, we can account for all buffered reads of a single thread, by adding m extra chains, where each consists of all buffered reads of some variable x (there are at most m different variables). There are in total km of such chains for all k threads. However, every such chain (of buffered reads of x) can be represented by a single trivial profile \(( rd _{}(x))\).

Our algorithm ends up enumerating all possible profiles for such partitioning of \(\overrightarrow{EG_{\tau }}\) into a family of chains. There are at most \(((2\,m)^2)^{2k}\) different selection of profiles to consider. It is easy to see that it takes O(n) time (n is the length of the trace) to compute the set of all profiles.

We need to argue that given the combination of the fixed km (trivial) profiles and a choice of 2k profiles (from \(((2\,m)^2)^{2k}\) many choices), a violation can be found in polynomial time, if one exists. This is equivalent to having a system of (at most) \((m+2)k\) components, where each component is a single event, a pair of events connected by an undirected edge, or a pair of components linked by a directed edge. The goal is to find a cycle in this system that obeys the direction of the directed edges. A slightly modified depth-first search algorithm can find the cycle in time polynomial in mk.

To summarize, the complexity of the algorithm is \(O(n + c^k)\) where n is the length of the trace, c depends only on the number of shared variables in the program, and k is the number of program threads.

Theorem 9

For a program P with a fixed number of threads, the algorithm in Fig. 3 discovers a witness to violation of TSO-serializability of any trace of P in time polynomial on the length of the trace.

6.3 Poly-time Monitor for Transaction-Fenced Programs

An alternative way of avoiding the high complexity of monitoring traces for TSO-serializability violations, for instance when there is a large number of threads in the program, is to simplify this check by ensuring that every transaction ends with a fence event (and hence making all its updates visible to other threads when it ends). As stated in Theorem 3, TSO-serializability is equivalent to the conjunction of robustness and SC-serializability for such programs.

A witness to non-robustness of a program can be discovered through a targeted search (for a specific pattern of violations) in the space of SC-executions of the program [6] using an algorithm that works in polynomial time for a given execution. The combination of these two monitors, a poly-time monitor for SC-serializability and a poly-time monitor for robustness, gives rise to an efficient monitor for TSO-serializability that observes only SC-executions of a program and looks for robustness or SC-serializability violations. Every violation to TSO-serializability will manifest as an SC-serializability violation or as a robustness violation for a transaction-fenced program.

The advantages of this result are twofold: (i) when transactions are naturally fenced (e.g. a lot of Java library methods are like this), it provides a poly-time algorithm for monitoring TSO-serializability, and (ii) when transactions are not naturally fenced, and the program has a large number of threads (which limits the applicability of the algorithm in Sec. 6.2), it provides the programmer with a solution: namely, to insert a fence at the end of each transaction that is not already fenced, and gain an efficient sound and complete monitor for TSO-serializability. Having a transaction-fenced program has the additional advantage that it allows to reason about the more familiar notions of SC-serializability and robustness instead of directly reasoning about TSO-serializability.

7 Related Work

To the best of our knowledge, this paper provides the first definition of conflict serializability under TSO. Conflict serializability was introduced in [21] for database transactions. Decision procedures for conflict serializability of finite-state concurrent models executed under an SC semantics were proposed in [10, 11] and [5]. Both static [13, 17, 24, 26] and dynamic tools [12, 14, 23, 25] have been developed to check SC serializability, as well as transactional memory techniques that enforce serializability at run time [9, 16, 18, 22]. The non-atomicity of writes under TSO poses new algorithmic challenges for monitoring serializability. Since observing the detailed sequence of write issues and commits is not efficiently possible (without access to the cache coherence mechanism), any dynamic analysis needs to monitor executions with missing information, that effectively stand for sets of executions. We propose a new monitoring algorithm for traces (i.e. sets of executions) that searches for certain type of cycles in graphs with both directed and undirected edges, which is more challenging than the classic serializability monitor that searches for a cycle in a directed graph [21].

Linearizability has been studied for concurrent objects running under TSO [7, 15, 19]. This provides a means of establishing a relation between a concrete and an abstract object, which must hold in the context of every possible client of the object. The abstract object methods need not be atomic. In contrast, serializability is a property that is applicable to programs and the atomicity of a transaction is considered in the context of one specific program (in contrast to all possible clients).

Notions of robustness for TSO programs have been investigated in [24, 6, 8]. However, we are not aware of any work that establishes a relationship between robustness and atomicity under different memory models as done in this paper.