The optimal structure of algorithms for α-paging

doi:10.1016/j.ipl.2015.07.011

Information Processing Letters

Volume 115, Issue 12, December 2015, Pages 932-938

https://doi.org/10.1016/j.ipl.2015.07.011 Get rights and content

Highlights

•
Structure of the optimal offline algorithm OPT for an existing paging model for Flash memory devices.
•
An online characterization of OPT is provided.
•
OPTMark, a class of online algorithms based on OPT is proposed.
•
Empirical results show that algorithms from OPTMark can outperform LRU.

Abstract

Paging is an important part of data management between two memory hierarchies, a fast cache and a slow disk. Its main application areas are modern operating systems and databases. Paging algorithms need to take decisions without precisely knowing the future behavior of the system, therefore paging is one of the most studied problems in the field of online-algorithms. In this paper we consider α-paging [13], a variation of the classical paging problem. It models the asymmetry between reading and writing data when the slow disk is implemented by means of flash memory. We develop an online structure that keeps track of the cache contents of the optimal offline algorithm. Based on this structure we design the algorithm class OPTMark which has the best possible competitive ratio and performs well on real-world traces.

Introduction

Flash memory combines the advantages of semiconductor-based memory and non-volatile data storage devices: it allows fast random read access and no power supply is needed for archiving data. Compared to mechanical hard-disks flash memory devices are lighter, more shock-resistant and consume less power. Due to their decreasing price they have become an economically competitive alternative in many areas (in particular for mobile computing). Like most other storage technologies, flash memory works block-based. However, modifying a block on flash memory typically requires rewriting a number of neighboring blocks as well. Therefore, best writing performance is achieved by sequentially writing several neighboring blocks at once, whereas reading can be done efficiently by reading only a single block at once [2]. We consider this read/write asymmetry for the paging problem on two-level memory hierarchies. Paging strategies decide which memory pages reside in the fast and small cache (e.g. RAM) and which have to be loaded upon access from a slower and larger disk (e.g. a flash memory device).

Provided with a cache of fixed size k and a disk of infinite size we have to serve a sequence of requests to memory pages of equal size. If the currently requested page is in the cache we are done; otherwise the requested page needs to be loaded from the disk into the cache, and we say that a page fault occurs. Upon a page fault the paging algorithm needs to decide which page to replace, if the cache is full. In the classical paging problem the goal is to minimize the number of page faults. There exist many variations of this simple core problem, e.g. using page weights [9], varying the cache size [14] or requesting sets of pages instead of single pages [8].

In the α-paging model [13] it is assumed that pages evicted from the cache need to be written back to disk and that this is more efficient if done in bundles. Upon one eviction the algorithm is allowed to write out up to α arbitrary pages in order to make room for new ones. In contrast to the classical model the cost for α-paging is given by the number of evictions.

More precise, upon request of page p the algorithm pays cost 0 for loading p into the cache if needed. Since the cache size is bounded by k the algorithm may need to perform evictions to make room for new pages. In each step an arbitrary number of bundles $B_{1}, B_{2}, \dots, B_{l}$ , each containing at most α pages can be evicted at the cost of l (number of bundles). Although the model permits evictions at every page request, it was shown that it suffices to consider only lazy algorithms [13]. Lazy algorithms evict at most one bundle, and they do so iff the cache is full and the requested page is not in the cache.

Simple algorithms like LRU can be adapted in a straightforward way, e.g. α-LRU evicts the α least-recently requested pages instead of the least recently requested page. Note that for $α = 1$ the model is equivalent to the classical paging model (up to a constant additive cost factor due to the different cost measure).

Note that α-paging captures some but not all characteristics of flash memory devices in paging scenarios. An important aspect concerns the degree of freedom to choose a set of α jointly evicted pages. If those were to be written back to their original positions on flash memory, improved writing behavior would only occur if these positions happen to be adjacent on the device. In practice, however, so-called flash translation layers like the ones used in EasyCo [7] or ExtremeFFS [17] efficiently overcome this issue by re-mapping the logical positions of those α flash pages so that they can be physically written in neighboring device positions whereas their old positions are internally marked to be invalid. Occasional compaction phases rearrange invalid slots so that they can be reused in consecutive fashion. Another issue not captured by this model is that reading pages from flash memory is not for free in practice.

The most prominent way to analyze online algorithms is competitive analysis [11], [18], where the cost of the online algorithm is compared to the cost of the optimal offline solution, i.e. an algorithm having full knowledge about the input sequence. A deterministic online algorithm A is c-competitive if for any input sequence it holds that $cost (A) \leq c \cdot cost (OPT) + b,$ where $cost (A)$ and $cost (OPT)$ denote the cost of A and the optimal offline cost respectively, and b is a constant. For deterministic algorithms the lower bound on the competitive ratio is k [18]. Three of the most prominent k-competitive paging algorithms are LRU (Least Recently Used), FIFO (First In First Out) and FWF (Flush When Full) [18]. For relevant results on online algorithms, we refer the reader to comprehensive surveys in [3], [5]. In the α-paging model it was shown that $k / α$ is a lower bound on the competitive ratio [13]. The lower bound is matched by the $k / α$ -competitive algorithm α-LRU.

For classical paging a simple optimal offline algorithm MIN is known [4], and works by evicting, upon a cache miss, the page in cache which is re-requested farthest in the future. Its adaptation α-MIN [13] evicts the α farthest distinct pages and was shown to be optimal in the α-paging model. An online characterization of the optimal offline algorithm describes precisely OPT's possible cache contents given only the sequence seen so far. The first online characterizations of MIN were provided in [15] for the design and analysis of the first strongly competitive algorithm and in [12] in order to prove that LRU is optimal in the diffuse adversary model. Alternative descriptions were provided in [1] and [6] for the purpose of space-efficient strongly competitive randomized algorithms.

In [16] the characterization from [6] was used for designing deterministic online algorithms which always cache all revealed pages, these are pages which are for sure in MIN's cache independent of the future requests. This approach always leads to k-competitive algorithms. More important is the observation that most requests in real-word inputs are requests to revealed pages.

We provide an online characterization of the optimal offline algorithm for α-paging and prove its correctness. It is an adaptation of the characterization from [6]. After each request it splits the page-set in three categories:

•
revealed pages, contained in the optimal cache for all possible future requests
•
pages which are not in the optimal cache for all possible future requests
•
pages, where both possible scenarios for the future exist, one where they belong to the optimal cache and one where they do not.

Using this structure, we propose a new priority-based marking class, which is

k / α

competitive and caches preferentially revealed pages. We combine our algorithm class with the future prediction strategy RDM [16] and perform trace-driven simulations. We conclude that the resulting algorithm can outperform adaptations of FIFO and LRU.

Section snippets

Optimal configurations

The optimal offline algorithm α-MIN, in the following denoted OPT, uses the following strategy: if the cache is full and a page fault occurs it evicts the α pages re-requested furthest in the future. Given the processed sequence $σ_{1}$ we are interested in all possible cache contents of OPT. A cache configuration C is a set containing at most k pages. We say that C is valid iff there exists a continuation $σ_{2}$ such that OPT has cache content C after processing $σ_{1}$ given that the overall input is $σ_{1} σ_{2}$ .

A new algorithm

Algorithms from the OnOPT class [16] for classical paging always stay in valid configurations. One could do the same for α-paging using the update rule from Theorem 1. Unfortunately this leads to a bad empirical performance, since in the α-paging model this approach leads to non-lazy algorithms due to update rule (2c). More precisely in this case we would perform an eviction containing only one page, although we could evict α pages for the same cost.

In the following we describe a new

Experiments

We use the RDM priority strategy [16] for OPTMark and name the resulting algorithm α-RDM. This priority strategy is based on the current timestamp t which counts the number of requests (excluding requests to revealed pages). RDM assigns the requested page p the priority $0.8 t + 0.1 (t - t_{0})$ , where $t_{0}$ is the value of t upon p's last request from $L_{0}$ . The first parameter represents a recency priority and the second one p's duration of stay in OPT's cache. We run experiments on the traces used in [10],

Conclusion

In this paper, we studied the optimal offline algorithm for α-paging. We provided an online layer structure which determines all possible cache contents of OPT in an online scenario. The layer structure was used to show that online algorithms which cache enough revealed pages have the best possible competitive ratio. Additionally we used the layer structure to design a new algorithm class OPTMark, which approximates the optimal solution on all inputs. Due to the experimental results we conclude

References (18)

D. Achlioptas et al.
Competitive analysis of randomized paging algorithms
Theor. Comput. Sci.
(2000)
A. Fiat et al.
Competitive algorithms for the weighted server problem
Theor. Comput. Sci.
(1994)
D. Ajwani et al.
On computational models for flash memory devices
S. Albers
Online algorithms: a survey
Math. Program.
(2003)
L.A. Belady
A study of replacement algorithms for virtual-storage computer
IBM Syst. J.
(1966)
A. Borodin et al.
Online Computation and Competitive Analysis
(1998)
G. Brodal et al.
OnlineMin: a fast strongly competitive randomized paging algorithm
Theory Comput. Syst.
(2012)
EasyCo
Managed flash technology
L. Epstein et al.
Paging with request sets
Theory Comput. Syst.
(2009)

There are more references available in the full text version of this article.

Cited by (0)

^☆: Partially supported by the DFG grant ME 2088/3-1, and by MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation.

View full text

The optimal structure of algorithms for α-paging☆

Highlights

Abstract

Introduction

Section snippets

Optimal configurations

A new algorithm

Experiments

Conclusion

Theor. Comput. Sci.

Theor. Comput. Sci.

On computational models for flash memory devices

Online algorithms: a survey

Math. Program.

A study of replacement algorithms for virtual-storage computer

IBM Syst. J.

Online Computation and Competitive Analysis

OnlineMin: a fast strongly competitive randomized paging algorithm

Theory Comput. Syst.

Managed flash technology

Paging with request sets

Theory Comput. Syst.