1 Introduction

Reversible computing studies computation models that exhibit both forward and backward determinism. This field has a long history, and although usually motivated by the promise of lower energy consumption qua the thermodynamics of computation, is now increasingly seen to be important in connection with quantum computing, which e.g. relies on reversible computing methods for significant parts of quantum circuit synthesis and design (e.g. as subroutines.)

A general task in reversible computation models is the simulation of irreversible functionality, and the key problem for simulation is dealing with erasure. Any data erased in a simulated irreversible program must be collected and assembled to preserve reversibility. This assembled garbage data takes up extra space both during and after the computation, and minimizing garbage has been a significant goal for reversible computation. General reversible simulation methods have been extensively studied, especially in terms of time/space tradeoffs (see e.g. [12] and references therein.) However, very little has been done in the field of reversible programming, not even for specific algorithm families, despite the importance of finding reversible algorithms that do not significantly deteriorate the asymptotics (i.e. use excessive time or space) compared to the irreversible versions. The problem is well-recognized in other subfields, e.g. in reversible and quantum logic synthesis and design, where minimizing the number of (output) garbage lines and ancillae (temporary) lines is a central challenge [2, 13].

In this paper we focus on developing a family of efficient reversible comparison sorts with asymptotically optimal (minimal) garbage, while also keeping the running times of the irreversible counterparts. Reversible sorts have numerous applications, and analysis and programming techniques for reversible sorts are effectively reusable elsewhere. A key point is that we want the auxiliary garbage to be reusable between the various algorithms, to make the solutions as modular as possible. Now, for sorting of an array of length n, it is not difficult to see that \(\mathrm {\Theta }(n \log n)\) bits (encoding a permutation) will suffice for the output garbage (see e.g. [12, Chap. 15]), but the central problem is how to achieve, in practice, efficient information storage and retrieval from minimized garbage at runtime. Using a general reversible simulation method (history embedding, or the input-copying compute-copy-uncompute method [3], see also below) will not work, as the garbage size will often exceed the lower bound: these methods always use a trace of the size of the running time, even when the optimal garbage is smaller.

Some prior work on reversible sorting exists: In \(\mathcal {MOQA}\) quicksort uses rank (a number in the range \([0,n!-1]\) representing the permutation) to ensure injectivity, although not actual reversibility [6]. Yokoyama et al. used ranks for generating a reversible insertion and merge sort [15]. Lutz and Derby used direct representations of permutations for a pioneer attempt at reversible bubble sort [9], although unfortunately their program does not work correctly. Hall used factorial representation (also in array form) in an input-copying out-of-place reversible insertion sort. Despite the worst case number of comparisons being quadratic, the garbage size is in this way reduced to \(n\lceil \log _2 n\rceil ~(=\mathrm {\Theta }(n\log n))\) [7]. The Reverse C compiler [12, Chap. 10] generates reversible simulations of (a subset of) C programs, and is applicable to sorting programs. However, this tool defaults to using a history embedding in nearly all cases.

Based on this, we propose to use direct and factorial (factoradic) representations of permutations for both the intermediate and output garbage of reversible comparison sorts, and show how this facilitates their programming. The techniques apply to many different sorts, and the representations can be efficiently converted into each other (or into ranks) if needed. Our main contributions are:

  • The notions of faithful and hygienic reversible simulation are introduced: A faithful simulation incurs no asymptotic time overhead, and bounds the space overhead by some given function g(n). If g(n) cannot be lowered asymptotically, the simulation is called hygienic (Sect. 2.)

  • New hygienic reversible versions of sorting programs are developed for in-place insertion sort, bubble sort, and selection sort using factorial representation; and merge sort and quicksort using direct representation (Sect. 3.)

  • We uncover several unconventional relations among reversible sorts, where garbage and code are shared in novel ways, including an identity permutation trick that yields a two-pass reversible simulation (with optimal output garbage), for any comparison sort (Sect. 3.)

Programs can be run on the online interpreter at topps.diku.dk/pirc/sorts.

2 Preliminaries

In ordinary programming languages, we usually write programs that lose information during computation, and we have to do so when implementing many-to-one functions. However, in reversible programming languages (e.g. [9, 14]) the information is preserved in each computation step, and so only injective functions can be implemented. A language is said to be r-Turing-complete if all injective computable functions are in range [1, 3].

For any function \(f : X \rightarrow Y\), there exists injective functions \(f' : X \rightarrow Y \times G\) for some G such that for all \(x \in X\), \( fst (f'(x)) = f(x)\), where \( fst \) is the projection \( fst (x, y) = x\). We say that \(f'\) is an injectivization of f. The G output is only used to make \(f'\) injective, and is in this sense irrelevant to the original Y output. In reversible computing terms G is called output garbage.

2.1 Reversible Simulations

Let IR be an (irreversible) language and R be a reversible language. A reversible R-program q is a reversible simulation of IR-program \(\texttt {p}\) iff \( fst ([\![ \texttt {q} ]\!]^\texttt {R}(x)) = [\![ \texttt {p} ]\!]^\texttt {IR}(x)\) for all x, i.e. \([\![ \texttt {q} ]\!]^\texttt {R}\) is an injectivization of \([\![ \texttt {p} ]\!]^\texttt {IR}\). A reversible simulation q of p is called clean if it produces only the original output of p, but no garbage, i.e. \([\![ \texttt {q} ]\!]^\texttt {R} : X \rightarrow Y \times \mathbbm {1}\) where \(\mathbbm {1}\) is the unit type.

Clearly, we can only have clean reversible simulations of programs that compute injective functions. For non-injective functions, we need a concept to describe that some reversible simulations behave better in terms of garbage than others. We say that q is a faithful reversible simulation of p with garbage bound \(g : \mathbbm {N} \rightarrow \mathbbm {N}\), if there is a constant c, which may depend on p (and R), but not on the program input x, such that the following three conditions hold:

  • bounded garbage output: \(| snd ([\![ \texttt {q} ]\!]^\texttt {R}(x))| \le c \cdot g(|x|)\) for all x,

  • no asymptotic time overhead: \( time _\texttt {q}^\texttt {R}(x) \le c\cdot time _\texttt {p}^\texttt {IR}(x)\) for all x,

  • at most g extra space: \( space _\texttt {q}^\texttt {R}(x) \le c \cdot ( space _\texttt {p}^\texttt {IR}(x) + g(|x|))\) for all x,

where |z| is the size of data z in the binary representation. Here, \( time _\texttt {p}^\texttt {L}(\mathtt {d})\) represents the number of execution steps (or application of semantic rules or some other reasonable measure of time) of a program p for an input d in a language L, and \( space _\texttt {p}^\texttt {L}(\mathtt {d})\) represents the maximum space usage (e.g. the size of the heap and stack) during the execution of a program p for an input d in a language L.

The first condition states that the garbage size is bounded by g(|x|). What we want to emphasize here is that g depends only on the size of x but not the content of x. The second condition states that the reversible simulation q has the same time complexity as the irreversible counterpart p. The intuition of the third condition is that any extra space is dedicated to garbage manipulation.

A faithful reversible simulation q of p is called hygienic with garbage bound g if there is no q’ and \(h(n) = \mathrm {o}(g(n))\) such that q’ is a faithful reversible simulation of p with garbage bound h. That is, a hygienic simulation is time-wise the best we can hope for: it is (asymptotically) optimal in its garbage usage and it does not violate the time complexity of the program it simulates.

2.2 The Janus Reversible Programming Language

The reversible algorithms in this paper are implemented in an extended version [14] of the reversible programming language Janus developed in 1982 [9]. Janus is a simple imperative language, essentially C-style syntax with a few key differences to ensure reversibility. To make the text as self-contained as possible, we here provide a brief introduction to the language.

As a concrete example, below is a Janus procedure to compute the factorial function. Given a natural number n \((\ge 0)\) and zero-cleared res (meaning res is 0), the procedure factorial sets res to n! with n left unchanged:

figure a

The base types of Janus variables are integers, and arrays and stacks of integers. The atomic statements (e.g. lines 2, 5, 7 above) are reversible updates of variables relative to their existing content, rather than absolute assignment. For this, C’s shorthand for compound assignment +=, -=, and is used. As an example, the compound assignment x += y*3 is allowed, but the simple assignment x := 3 is not. We require that the left-hand assignment target does not occur in the right-hand expression (or in the index expression if the left-hand side is an array cell), to avoid otherwise irreversible updates like x -= x. The statement swaps variable values. Control flow operators in Janus use runtime assertions to orthogonalize join points, and ensure reversibility. Thus, the conditional statement if \(e_1\) then \(s_1\) else \(s_2\) fi \(e_2\) works almost like an ordinary if-then-else, but the expression \(e_2\) must evaluate to true when exiting the then branch, and false when exiting the else branch, and this is enforced at run-time. Similarly, the reversible loop statement from \(e_1\) do \(s_1\) loop \(s_2\) until \(e_2\) requires the entry assertion \(e_1\) to be true at the first entry to the loop, and false in every subsequent iteration. The do-statement \(s_1\) is executed between testing the entry and exit expressions, and the loop-statement \(s_2\) between exit and entry. Local variables are declared using the local statement initializing a (fresh) variable to the value of given expression. For reversibility, these are paired with delocal (un)declarations, which remove variables from scope by zero-clearing them using programmer-given expressions, the correctness of which is enforced at run-time (see lines 3 & 10, and 4 & 6.) A call statement can be used for procedure calls using pass-by-reference parameters. An unconventional feature of Janus is the possibility of inverse procedure invocation with the uncall statement, which runs the called procedure body backward.

Assume that factorial is invoked by a call. The initially zero-cleared res is set to one at line 2. Each loop updates the value of res to \( res \times i\) by using temporary variable tmp, satisfying the invariant \(\{ res =i!\}\). The loop repeats until the index i reaches n. Then res is n!, and the local variable i is removed. Input and output of procedures (the values of the parameters) are related by the semantic function. For example, \([\![ \texttt {factorial} ]\!]^\texttt {Janus}(5,0)=(5,120)\), although keep in mind that parameters are still pass-by-reference.

For brevity, we shall employ a fair amount of syntactic sugar in this paper: this includes declaring/removing multiple local variables simultaneously (e.g. local int x = 0, int y = 1, ...). When the meaning of any further sugar is not intuitively obvious, this will be explained in the text.

Since we are dealing with reversible simulations of irreversible programs, for the base irreversible language we shall also use Janus, but with two small modifications: we add the irreversible statement emit( x ) which erases (zero-clears) x and elide the uncall statement (as emits cannot be rolled back.) A central difficulty when writing reversible Janus procedures is to invent the proper assertions for conditionals and loops to specify which direction the control comes from, and to find proper expressions to deallocate local variables. Generally, the invention of assertions is known to be challenging (see [11], and references therein.) However, if we can emit data, it is straightforward to implement irreversible programs. An absolute assignment can be mimicked by initially zero-clearing the target with emit. For control flow, a fresh variable can be used to temporarily store where the control came from, and subsequently emitted afterwards. For example, to implement an irreversible conditional if \(e_1\) then \(s_1\) else \(s_2\), we allocate a fresh variable t before the reversible conditional, add the assignment t += 1 to the then branch and use the assertion t = 1 to reversibly merge the branches. Afterwards, we discard t’s value with emit(t) and deallocate. In similar fashion, one can implement ordinary irreversible loops, deletion of variables, and other irreversible behavior fairly easily by vigorous use of emit.

Code is written in typewriter font. Mathematical objects, semantic values and metavariables are written in \( italic ~ fonts \). We use a[m..n] to indicate the subarray of a[] with the \(n-m+1\) elements between the m-th to n-th elements (inclusive), if \(m \le n\), and an empty array, otherwise. Array values are denoted as \( italic ~ fonts \) with subscripts; the value of a[i] is denoted as \(a_i\).

3 Comparison Sorts

We shall consider reversible comparison sorts that take an unordered array of length \(n~(\ge 1)\) and return an ordered array (together with garbage.) Our aim is in all cases to implement reversible versions of existing comparison sorts such that the reversible programs perform the same number of comparison/exchange operations, on the same elements, and in the same order, as the irreversible comparison sort would on the same input.

Sorting for an array of length two or greater is generally irreversible, in the sense that multiple input arrays are transformed into the same ordered array. If we have an array of distinct elements, the number of possible starting permutations is n!. To distinguish n! cases, in binary representation we need garbage of at least \(\lceil \log _2 (n!) \rceil \) bits (\(= \mathrm {\Theta }(n\log n)\).)

In this section we iteratively construct reversible simulations of major comparison sorts. We optimize intermediate and output garbage as low as the asymptotically optimal hygienic boundFootnote 1 \(g(n)=\mathrm {\Theta }(n\log n)\).

3.1 Bubble Sort

Bubble sort rearranges the array a[0..n-1] in place to be in order. First, it compares the rear two values \(a_{n-1}\) and \(a_{n-2}\) and exchanges the content of a[n-1] and a[n-2] if they are out of order. This continues with the preceding entries a[n-2] and a[n-3], a[n-3] and a[n-4], and so on. In the process, the smallest element is moved sequentially to the front, until it is placed in a[0]. This whole process is repeated on the (unordered) subarrays a[ i ..n-1] for \(i=0,1,\ldots ,{n-1}\). Each repetition moves the proper minimal element to the first entry of the subarray a[ i ..n-1] so eventually all the elements are ordered.

Table 1. The operation of bubble sort for \(\{2,5,4,0,3,1\}\). The right table gives the factorial representation used in bsort4.

The left table of Table 1 shows how bubble sort works on a six element array. The unsorted array at the top becomes a sorted one at the bottom. The underlined elements are slided to the right by one element in each loop iteration. The elements under the jagged line are known to be at the final position in the array.

A straightforward program for (irreversible) bubble sort is obtained by adding, to the ordinary bubble sort, assertions of reversible loops, reversible conditionals, reversible deallocation of variables using emits as outlined in Sect. 2.

figure c

The temporary variables t1t3 are at least of size one, and i and j are at least of size \(\lceil \log _2 n \rceil \) and \(\lceil \log _2 (n-1) \rceil \). In the outer loop, t1 and j are emitted \(n+1\) and n times. In the inner loop, t2 and t3 are emitted \(\frac{1}{2}(n^2+n)\) and \(\frac{1}{2}(n^2-n)\) times. Variable i is emitted once. The amount of emitted bits is \(\mathrm {\Omega }(n^2)\) and is thus greater than asymptotic minimum \(\mathrm {\Theta }(n\log n)\) discussed above.

Many of the emit statements and the use of local variables can be removed by observations on the counter variables i and j. The outer loop increments counter i from 0 to n, and the inner loop decrements counter j from \(n-1\) to i. Therefore, the assertions on the entries of the outer and inner loops can be replaced with i=0 and j=n-1, respectively, and the final values of the counter variables at the end of the loop execution can be used to deallocate the counter variables. This obviates the preceding emit statements and the use of local variables t1 and t2, but does not change the asymptotic complexity of the amount of emitted bits because of the emission of the contents of t3. In what follows we shall use such simple refinements without further explanation.

History Embedding. In comparison sorts we compare and, if necessary, swap a pair of elements ab. Since two precursor states (either ab or ba) are merged into a single state (unless \(a=b\)), one bit of information is lost. Specifically, in bsort_irev, the use of local variable t3 and the emission of its content cannot be obviated by local analysis. A simple method to reversibly compensate for the lost information is to add this bit to garbage data. If we collect the information from each comparison as garbage, then garbage (and space) use of the reversible simulation are of size \(\frac{1}{2}(n^2-n)\). Using a stack gb for this yields the following code:

figure d

such that bsort1 is a reversible simulation of bsort_irev, i.e., for all a and n, \( fst ([\![ \texttt {bsort1} ]\!]^\texttt {Janus}(a,n)) = [\![\) \({\texttt {bsort\_irev}}]\!]^\texttt {Janus+Emit}(a,n)\). The number of garbage bits accumulated by bsort1 is \(\frac{1}{2}(n^2-n)\), provided that push only pushes one bit at a time to stack gb. Note that the comparisons performed are exactly the same (and in the same order) as that of bsort_irev, so the time complexity is unaffected, and thus bsort1 is a faithful reversible simulation of bsort_irev with garbage \(\mathrm {\Theta }(n^2)\). This technique can be regarded as an instance of the Landauer embedding, cf. [1, 3].

Call-Uncall Convention. Such garbage can be reversibly canceled and replaced with the original input by what is known as the Bennett trick [3]. In Janus, the uncomputation (or Lecerf reversal) of a procedure call is realized by its inverse invocation (using an uncall) with the same parameters as the original call [14]:

figure e

Let k be the bit size of elements of the array a[]. The size of the output garbage for bsort2 is always nk. Note that this does not produce a faithful simulation with \(\mathrm {O}(nk)\) garbage, as the intermediate space usage by gb is still \(\mathrm {\Theta }(n^2)\).

The Identity Permutation Trick. A variant of the call-uncall convention can return a (direct) permutation as garbage if it uncalls bubble sort bsort1 with an identity permutation p[]:

figure f

Since the sorted a[] and identity permutation p[] have the i-th smallest element at index i the invocation of bsort1(p,n,gb) takes exactly the same path as the inverse invocation of bsort1(a,n,gb) in the opposite direction. The size of the elements in p[] need not be larger than \(\lceil \log _2 n\rceil \), so the permutation can be simply represented in \(n \lceil \log _2 n \rceil \) bits \((=\mathrm {\Theta }(n\log n))\), which is asymptotically optimal. We refer this programming technique as the identity permutation trick.

This observation provides a free source of a useful theorem: For any irreversible comparison sort, there is a reversible comparison sort that returns garbage in the form of a permutation of the same length as the input, with the same (asymptotic) running time for each input array.

Even though the garbage size at the end of this reversible simulation is asymptotically optimal, bsort3 has two shortcomings. First, bsort3 as well as bsort2 is a two pass program; bsort1 is both called and uncalled in the body, and counting up and down of counter variables in bsort1 is performed twice, which adds to the time overhead. Second, the reversible simulation is not hygienic. The intermediate garbage size \(\mathrm {\Theta }(n^2)\) is still asymptotically greater than the optimal garbage size \(\mathrm {\Theta }(n \log n)\). Still, this trick transforms intensional garbage related explicitly to the inner workings of bubble sort into extensional garbage related only to the functionality of sorting in general.

Fig. 1.
figure 1

Time/space of reversible bubble sorts.

Strictly speaking, before calling bsort3(a,n,p) we need to prepare an identity permutation of length n to zero cleared p[]. However, this only requires time \(\mathrm {\Theta }(n)\) and does not affect the asymptotic behavior of the reversible simulation. Thus, in the following discussion on complexity we can ignore the process.

Figure 1 provides a conceptual view of the intermediate garbage behavior of the reversible bubble sorts in this paper. The maximum space used by bsort1bsort3 is proportional to the execution time. Procedures bsort2 and bsort3 have slightly different peaks and different garbage size at the end. This is because they use different garbage representation; in this particular diagram we assume that \(\log _2 n\) is smaller than the bitsize k of the elements we sort. During the latter half of computation bsort2 keeps an extra array of size nk for the output garbage, while bsort3 keeps a permutation of size \(n \lceil \log _2 n \rceil \). Later we shall see bsort4, which has output garbage of same size as bsort3, but is a single-pass hygienic algorithm with (intermediate) garbage \(\mathrm {\Theta }(n \log n)\).

3.2 Insertion Sort

(Back-to-front) insertion sort maintains the ordered subarray a[ \(j+1\) ..n-1] and in each iteration (for \(j=n-1,\ldots ,0\)) adds a[ j ] to the subarray, by comparing and exchanging \(a_j\) with \(a_{j+1}\), \(a_{j+2}\),\(\ldots \), until a[ j ..n-1] are ordered. In the worst case insertion sort performs \(\frac{1}{2}(n^2-n)\) comparisons, but in the best case just \(n-1\). If we naively add a garbage bit for each comparison, i.e., apply the history embedding, then the (garbage) space complexity of the resulting reversible simulation is \(\mathrm {O}(n^2)\) and \(\mathrm {\Omega }(n)\). We can then further apply the techniques of the call-uncall convention, and the identity permutation trick (as described above for bubble sort) to transform this reversible simulation into a program with a more useful output garbage representation. (We omit the concrete programs from this paper.) These reversible simulations are all faithful, but with (intermediate) garbage bound \(g(n)=n^2\), so they are not hygienic.

To further optimize the intermediate garbage of reversible comparison sorts, and in particular in order to obtain a hygienic solution, it appears that we have to use both problem and algorithm specific knowledge.

Permutations as Intermediate Garbage. Our first attempt at a hygienic insertion sort is to use permutations not just as output, but also for the intermediate garbage data structure. Let p[] be initialized to the identity permutation. Every time we swap two elements in the input array a[], we swap the corresponding entries with the same indices in the permutation array p[]:

figure g

After each iteration of the inner loopFootnote 2 of line 5–9, the element initially stored at the j-th position is inserted at the index i of the ordered subarray a[ \(j+1\) ..n-1] where the elements of a[ \(j+1\) .. i ] has been shifted to the left by one element. The auxiliary procedure min(i,p,l,r) sets (initially zero cleared) i to the index k such that \(p_k\) is the minimum in \(p_l,\ldots ,p_r\), using \(\mathrm {O}(r-l)\) time and \(\mathrm {O}(r-l)\) space. Therefore, the inverse invocation of min(i,p,l,r) zero clears i if \(p_i\) is the smallest element in \(p_l,\ldots ,p_r\). Since min(i,p,j,n-1) traverses all the elements in p[ j ..n-1] for any j, unfortunately isort1 takes \(\mathrm {\Theta }(n^2)\) time.Footnote 3 Because insertion sort can be sub-quadratic for some inputs, isort1 is not a faithful reversible simulation of isort_irev. Here again, we ignore the preprocess setting an identity permutation to p[]. This is an example that manipulation of garbage can affect the asymptotic behavior of reversible simulation.

However, we can improve on this by the following observation. Because the inserted element is originally stored at the j-th position in a[] and \(p_j=j\), \(p_i\) must be j after the insertion. Without changing the meaning, we can replace lines 11–12 with the statements in the comments. Call the resulting procedure isort2. Then, for each outer loop iteration, the two inner loops have exactly the same number of iterations. Therefore, isort2 takes \(\mathrm {\Theta }(n)\) time at best.

The intermediate and output garbage of the bubble sort, the permutation p[] of length n, can be represented in \(\mathrm {\Theta }(n \log n)\) space (which is the hygienic bound.) Since the asymptotic time complexity of isort2 and isort_irev is the same for each input (an), there is some constant c such that \( time _\texttt {isort2}^\texttt {Janus}(a,n) \le c\cdot time _\texttt {isort\_irev}^\texttt {Janus+Emit}(a,n)\) for any a and n. Therefore, reversible program isort2 is a hygienic reversible simulation of isort_irev. It should be noted that the notion of faithfulness is quite sensitive to the time behaviour of the program it is a reversible simulation of. For instance, although bsort3 is a faithful bubble sort, it is not a faithful reversible simulation of insertion sort: even though both are \(\mathrm {O}(n^2)\), on some inputs insertion sort will be linear, which breaks the condition that a faithful simulation must conserve the time complexity on all inputs.

Unfortunately, we see that isort2 uses two inner loops to simulate a single inner loop of isort_irev in two passes. Next, we shall change the permutation representation which leads to a one-pass inner loop.

Factorial Representation as Garbage. Hall observed that although the worst-case number of comparisons is \(\mathrm {\Theta }(n^2)\), the intermediate garbage size of an input-copying reversibleFootnote 4 insertion sort can be reduced down to \(\mathrm {\Theta }(n\log n)\), since the outcome of the comparisons uniquely define a permutation in factorial representation [7]. We apply this idea to an in-place reversible insertion sort:

figure h

Given an unordered array a and an initially zero cleared array d, procedure isort3 returns an ordered array a together with garbage array d. For each i, array element \(d_i\) represents how many times the element initially placed at the index i of array a is moved to the right when it is inserted into the ordered array a[ \(i+1\) ..n-1] (when \(i=n-1\) the ordered subarray is empty and therefore \(d_{n-1}\) is always zero). That is, \(d_i\) is equal to the number of elements that are smaller than \(a_i\) in the initial subarray a[ \(i+1\) ..n-1]. By this construction array d is a (decreasing) factorial representation of the sorting permutation, with \(0 \le d_i \le n-i-1\) for \(0 \le i \le n-1\) [4], and has n! distinct values. If each element of d is of size \(\lceil \log _2 n \rceil \), the garbage size is the asymptotically optimal \(\mathrm {\Theta }(n\log n)\). A factorial representation can be efficiently transformed into an integer or rank of the permutation, with the same number of bits [4].

Table 2. The intermediate arrays and decreasing factorial representation of insertion sort isort3 for a sample array \(\{2,5,4,0,3,1\}\).

Table 2 shows how insertion sort isort3 works on a six element array. In the left table, the unordered array a at the top becomes an ordered array at the bottom. The elements under the diagonal line are already known to be sorted. In the right table, the zero cleared array d at the top becomes a factorial representation at the bottom. The elements under the diagonal are known to be the final values. Thus, in insertion sort, each element of d is set only once and never changed. In the left table, the element just left of the line is inserted into the ordered subarray on the right, where the underlined elements have been moved once to the left by the previous insertion. Note how this corresponds directly to the value to the right of the diagonal in d: at the m-th iteration, the number of interchanged elements is stored in \(d\texttt {[}{n-m-1}{} \texttt {]}\). This value \(d_{n-m-1}\) is used to directly deallocate the counter variable i of the inner loop at line 9 in isort3. Intuitively, if we know how deep an element is inserted, we can uniquely determine which element it is and what the previous ordered subarray was.

Thus, with optimal garbage, and only constant overhead in each loop iteration, the reversible program isort3 is a hygienic reversible simulation of the irreversible insertion sort isort_irev.

A dual to the decreasing factorial representation we use is known as the inversion table, whose i-th element contains the number of elements to the left of i that are greater than i in the original array. Inversion tables can be used for analyzing properties of (conventional) sorting algorithms [8, Chap. 5].

3.3 Hygienic Bubble Sort

Building on the hygienic reversible insertion sort development we shall now define a hygienic reversible bubble sort. The key to this is to consider the relation between the two algorithms as sorting networks.

Insertion sort and bubble sort are identical when considered as parallel sorting networks [8, Sect. 5.3.4]. Each vertical line in such a network contains an array element, and each horizontal edge is a comparison/swap operation. An unordered array is input at the top and a sorted result is obtained at the bottom. Figure 2 shows the network (twice) in action on a sample array; horizontal edges with arrowheads indicates that elements on the lines were swapped, and edges without arrows indicates that they were not swapped. In Fig. 2(b) traversing the (beige) boxes from the upper left to the lower right, and in each box from upper right to lower left, gives the order of comparison/swap operations exactly as done in a bubble sort. In Fig. 2(a) traversing the (gray) boxes from the upper right to the lower left, and each box from upper left to lower right, leads to an insertion sort. Our hygienic reversible insertion sorts count the number of exchanges for each gray box in Fig. 2(a), and stores this in array d, which is exactly the factorial representation at the end.

Fig. 2.
figure 2

A parallel sorting network in action as both insertion sort and bubble sort.

The factorial representation used in insertion sort contains exactly enough information to compensate for the information lost by each comparison/swap operation of bubble sort. In particular, note that the value of the intermediate factorial representation for some specific comparison, will be the same, regardless of whether we build the representation in the order of the gray boxes in Fig. 2(a) or the beige boxes in Fig. 2(b). Thus, we can build the same factorial representation as in insertion sort, but in the order of the bubble sort comparisons, and use this for the assertions we need.

We show the solution in form of the reversible program bsort4:

figure j

Let us return to Table 1, which shows how bsort4 behaves. The underlined elements in the left table are those slided to the right in each iteration, i.e. those for which the horizontal edges in Fig. 2 have arrows. In the right table we see how the corresponding factorial representation is iteratively built in the order of the bubble sort comparisons, cf. the beige boxes in Fig. 2(b). As a subsidiary result, this means that call isort1(a,n,d); uncall bsort4(a,n,d) will be the identity, as the procedures generate exactly the same output garbage.

While it was not obvious how to compress the run-time space usage of a reversible bubble sort, the connection to insertion sort provided the key insight of using the factorial representation for this as well. The reversible program bsort4 is a hygienic reversible simulation of bsort_irev.

3.4 Selection Sort

Another closely related sorting algorithm is selection sort. Selection sort repeatedly finds the greatest element of the remaining unordered subarray and lines them up in the ordered subarray, front to back. Using the programming techniques developed above, it is not difficult to construct a hygienic reversible selection sort with garbage in the form of a permutation or factorial representation. Here, we instead show a non-trivial relationship: the non-faithful reversible insertion sort isort1 can be directly used to realize a (hygienic) reversible selection sort, through code sharing.

In isort1, the initial p[] does not actually need not be the identity permutation; the algorithm will still work if p[] is simply an ordered array. Thus, given an unordered array a[] and identity permutation p[], call isort1(a,n,p) is functionally equivalent to uncall isort1(p,n,a). Further, in such an inverse invocation of isort1, the inverse procedure invocation uncall min(i,p,j,n-1) in the inner loop in line 11 becomes call min(i,p,j,n-1), which exactly identifies the minimum element \(p_i\) of the remaining subarray p[j..n-1]. The other inner loop of isort1 will move this element to \(p_{j}\). Therefore, given the procedure isort1, we can realize a reversible selection sort simply by swapping a[] and (identity permutation) p[] and inverting the sorting process:

figure k

Thus, although isort1 was not even a faithful reversible insertion sort, this is a hygienic reversible simulation of the corresponding irreversible selection sort.

Note that the intermediate garbage data of isort1 and ssort in forwards direction are different, as procedure isort1 generates its ordered subarray from back to front while ssort does this front to back. Moreover, if call ssort(a,n,p) is stable, so is uncall ssort(p,n,a), and vice versa (where min(i,p,j,n-1) sets i to the index of the leftmost minimum element.) This is because elements of the same value are not interchanged by the loop in lines 5–9 in isort1.

3.5 Merge Sort

Merge sort follows the divide-and-conquer approach. It recursively divides the array in the middle until we have arrays of length zero or one. Such subarrays are already ordered and thus conquered. Then, it recursively merges two ordered arrays into a single ordered array, until, eventually, the entire array is ordered.

Since merge sort is \(\mathrm {\Theta }(n\log n)\) time, its history embedding reversible simulation becomes hygienic. However, such garbage will depend on the algorithm (and implementation), preventing us from reusing garbage across algorithms. Further, the hidden constant of such a history embedding may be quite large. In this subsection, we therefore construct a reversible (stable) merge sort, with minimal garbage in the form of a permutation in direct representation.

Fig. 3.
figure 3

Merging two sorted subarrays.

The reversible procedure merge in Fig. 3 merges the ordered subarrays a[l..m] and a[m+1..r] into ordered subarray a[l..r]. It takes a[l..r], permutation p[l..r], and the middle index m between l and r as parameters. We allocate zero-cleared temporary arrays b[], c[], p_b[], and p_c[] of length \(r+1\) at line 2,Footnote 5 and deallocate them at line 33. The first and second loops moves the elements with index \(l,\ldots ,m\) in the data array a[] and permutation p[] to b[] and p_b[], and those with index \(m+1,\ldots ,r\) to c[] and p_c[]. Line 16 sets sentinels (used to avoid considering empty arrays) at the end of subarrays b[] and c[]; we assume INF is greater than any other element in a[]. The loop in lines 19–29 iteratively compares the heads of subarrays b[] and c[], and moves the smaller element back to array a[]. Accordingly, the elements of permutations in b_p[] and c_p[] are moved back to the original array p[]. Now, before the procedure call p[l..m] will contain the (distinct) values \(\{l,\ldots ,m\}\) and p[m+1..r] will contain \(\{m+1,\ldots ,r\}\), although not necessarily in order. Thus, it is straightforward that line 27 is able to disambiguate the incoming control flow correctly. Furthermore, after the procedure call p[l..r] contains the values \(\{l,\ldots ,r\}\). When all the elements are moved back to the original array a[l..r], now in order, the sentinels are zero cleared at line 32.

We use the procedure msort_sub to sort the elements in subarray a[l..r]:

figure l

If \(l<r\), i.e., the subarray a[l..r] has two or more elements, the former half and the latter half elemens are independently sorted by two procedure calls of msort_sub. The results are merged by procedure merge.

Finally, procedure msort is a wrapper procedure that passes the index of the first element 0 and the last element \(n-1\) of arrays a[] and p[]:

figure m

where p[] is initially an identity permutation. (This means that the assertion at line 27 of merge never fails.) msort is a hygienic reversible simulation of the corresponding conventional irreversible merge sort.

3.6 Quicksort

Like merge sort, quicksort uses the divide-and-conquer method. We construct a stable reversible quicksort based on simple pivot selection.

Procedure partition processes subarray a[l..r]. It uses the rightmost element a[r] as a pivot, rearranges the subarray a[l..r] into a[l..q-1] whose elements are smaller than or equal to the pivot and a[q+1..r] whose elements are greater than the pivot, and moves the pivot to a[q]. The permutation array p[] preserves the lost information of the division; during the dividing process, it moves the elements of array p[], such that whenever a[i] is set, p[i] contains the original position of the element placed there. p[l..r] is assumed to hold an increasing subsequence on entry to partition, so in particular, the permutation element p[r] corresponding to the pivot element a[r] will be the maximum value in p[l..r]. This will be important below. The procedure uses two auxiliary arrays for manipulating data t[] and p_t[]. The original position of t[i] should be equal to p_t[i] whenever t[i] is set.

figure n

The assertion at line 12 relies on short-circuit evaluation: If q >= l does not hold (in case no element in the subarray turned out to be greater than pivot a[r]), then j-q-1 = -1 and p[q] > p_t[j-q-1] are never evaluated. Dually,if j-q-1 = -1 holds, p[q] > p_t[j-q-1] is never evaluated. Finally, in case we did swap elements, then this is reflected in whether p[] or p_t[] holds the larger index.

To guarantee that assertion at line 12 always holds, we rely on the assumption that p[l..r] is in increasing order before the procedure call of partition. Note that in particular then p[l..q] and p[q+1..r] will be increasing subarrays after the procedure call, which we exploit recursively in the qsort procedure.

Procedure partition uses auxiliary arrays t[] and p_t[] of size \(r-l+1\), and runs in \(\mathrm {\Theta }(r-l)\) time. Unfortunately, partition is not in-place, contrary to conventional irreversible quicksort programs. Still, because p[q] holds the maximum we can erase the pivot pointer q after each partition.

The following procedure implements reversible quicksort:

figure o

The top-level call to quicksort is qsort(a,0,n-1,p) where p[] is an identity permutation. Because of this and the pre- and post-condition of p[l..r] of partition, the recursive calls to qsort each consider a subarray p[l..r] in increasing order. Therefore, the assertions in partition never fail. The only remainder after the recursive calls is the pivot pointer q, but by construction this points to the largest permutation element (initially in p[r]), and is thus easily identified and removed. For this we use the procedure max, which is completely analogous to the min procedure used in isort1. This provides us with a hygienic faithful reversible simulation (although not in-place) of the corresponding conventional irreversible quicksort (using the same choice of pivots.)

4 Concluding Remarks

In this paper we showed reversible programming techniques to enable reversible comparison sort programs to be more efficient, in terms of both time and space usage, than ones generated by general reversible simulation [3], even by asymptotic orders of magnitude. This was facilitated by unique reversible programming techniques such as efficient garbage representation and unconventional code sharing. We developed a family of reversible comparison sorts with better performance than previously known reversible sorts in terms of space usage, number of passes, and/or garbage representation. The resulting programs include several hygienic faithful reversible simulations of comparison sorts, i.e. reversible implementations which have the same time complexity as their irreversible counterparts, and optimal (minimal) space overhead.

It turns out that certain garbage representations are more suitable for programming some types of sorts than others; factorial permutation representation is suitable for reversible bubble sort and direct permutation representation is suitable for reversible quicksort, but the converse does not appear to be the case. Furthermore, the direct and factorial representations can be easily interpreted, which aids modularity between sorts, while other forms, such as rank, usually cannot. Still, these garbage representations can be efficiently and reversibly transformed to each other by clean reversible simulations of existing translations (e.g. [4, 5, 10], see Appendix A.) Choosing between these garbage representations is thus not essential from the viewpoint of asymptotic efficiency.

We believe the concepts identified here are also useful in other domains than comparison sorts. The notion of faithfulness and hygienicity can be used as criteria both for judging the efficiency of reversibilized programs, and using appropriate garbage representations can facilitate and guide reversible programming in general. However, this study also shows that reversible algorithmics cannot rely on simply combining generic reversible computing techniques with irreversible algorithms; domain analyses are required to get efficient solutions.