The fair allocation of resources is a central problem in scheduling. In its minimal formulation, given a number of tasks (demands in manufacturing, flows in networking, etc.) modeled by a utilization, the goal of the scheduler is to allocate resources as close as possible to an ideal fluid allocation. The lag measures the distance from the ideal schedule. A scheduler is normally called fair if it guarantees a bounded lag.
Motivated by the observation that a lower lag implies a greater proximity to the ideal resource allocation, in this paper, we propose a scheduling algorithm that produces schedules with lower lag bounds. We accompany our algorithm with some tightness results, showing that below a certain bound, no feasible schedule exists.
1 Introduction
The fair distribution of resources is a central problem in a broad variety of contexts. From the appointment of a chairman of a confederation [27] to the arbitration among impatient children, from manufacturing [19] to network [11, 21, 32] or CPU [2, 3] scheduling, the fairness in resource allocation concerns the assignment of an indivisible unitary resource to one among many demands. A common approach is to assign the resource proportionally to a given share [11, 21]. However, the indivisible nature of the resource poses a challenge, as the ones not receiving anything may complain of the unfairness. Fairness is then normally achieved by compensating over time for past uneven allocation. The goal of a scheduler is then to achieve fairness and a resource allocation which is as close as possible to the ideal case.
We start by informally introducing our nomenclature:
•
The resource to be allocated is the CPU time, divided in unit-length time slots (we use the generic term machine as synonym for CPU, core, etc.);
•
The entities demanding such a resource are n tasks;
•
Each task has a utilization, representing the needed share of the resource;
•
A schedule of the tasks on the machines is determined;
•
The distance from the ideal resource schedule is called lag.
We remark that we model each task by its utilization only and not by an execution time Ci and period Ti. The standard (Ci, Ti) model [17] applies indeed more closely to real-time applications. On the other hand, the utilization-only model addressed in this paper applies to the cases with workloads not necessarily released by periodic activations. In such a case, which is common on general purpose OSes or with hypervisors scheduling virtual CPUs, the scheduler aims at providing an abstraction of a slower flow of time.
In this context, the lag is a quantity greater than zero measuring the distance from the ideal resource schedule, with the limit case of zero lag happening only in the (impractical) case of an ideal “fluid” schedule [21]. In practice, the majority of scheduling algorithms [2, 3, 20] can achieve a lag less than 1. Our research aims at lowering the lag bound (which explains the “tight fairness” in the title). In fact, schedules with smaller lag offer more responsive service to tasks. Additionally, our scheduling algorithm has a low complexity and it avoids floating point operations too (which explains the “at low cost” in the title). This makes it suitable for actual implementations in operating systems, in which the computational complexity of the scheduler itself is of paramount importance.
Contributions. After the introduction of our system model and notation (in Section 2) and the illustration of a motivating example (in Section 3), our paper offers the following contributions:
•
The SimPFair algorithm for the case of two tasks is presented in Section 5. Its lag bound is computed and such a bound is demonstrated tight: no schedule can produce a lower lag;
•
The algorithm is then extended to any number of tasks in Section 6, where we also discuss its tightness.
Finally, Section 7 concludes the paper by setting some directions for future investigations.
1.1 Related works
The works in this area are very many and they cover a wide spectrum of disciplines. At least: manufacturing, combinatorics, and scheduling.
In Just-in-time (JIT) manufacturing [19], the production tries to match the demand of goods requested according to different “shares” (utilization in our paper). The ultimate goal of JIT is the reduction of storage space (a similar notion to our lag). In this setting, it is proved in [24] that a lag smaller than 1 can be realized for single machines with a formulation in terms of matching in a bipartite graph. This has also been developed in [8, 9] based on the work of [24] dealing with the Maximum Deviation Just-in-Time (MDJIT) sequencing problem. This problem, which consists in finding a schedule that minimizes the lag over all tasks at all times for a given utilization, is shown to be in Co-NP.
Motivated by questions from combinatorics and number theory, it was proved in [27] in the setting of apportionment problems (in particular for the chairman assignment problem) that, for any given utilization, it is possible to build a single machine schedule with lag bounded by \(1-\frac{1}{2n-2}\), with n tasks. This work is also closely related to the notion of balance in word combinatorics and of symbolic discrepancy (see e.g. the survey [29]; see also [8, 9]).
In network packet scheduling Weighted Fair Queueing (WFQ) [11] was proposed as a method to guarantee a bounded lag with respect to the ideal resource sharing, called Generalized Processor Sharing (GPS) [21]. Later, Bennett and Zhang realized that “there could be large discrepancies between the service provided by the packet WFQ system and the fluid GPS system”. Hence, they proposed Worst-case Fair WFQ (WF2Q) to fix the issue [32]. Some works have investigated how to change the order of the scheduled flows in the context of round-robin [14, 25, 30] and Deficit Round Robin [16, 23]. In this context, it is finally worth mentioning an utilization upper bound for weighted round robin [31].
In real-time scheduling, the usage of “fairness” among tasks was the instrument to achieve optimality (meant as the capacity to fully utilize all m machines) [3]. This seminal work initiated the family of PFair (abbreviating proportionate fairness) scheduling algorithms. The core idea was to split a task in subtasks of unit length and then schedule these subtasks according to given rules. Anderson and Srinivasan [2] proposed PD2, which simplified the scheduling rules. A second further variant PD2* was proposed by Nelissen et al. [20]. Bini proposed a method to adapt a fair schedule in presence of deviations from the ideal schedule [6].
The most investigated drawback of the PFair algorithms [2, 3, 20] is perhaps the large number of incurred preemption. To mitigate this phenomenon, Zhu et al. [33] proposed to check the fairness only at task deadlines (naming their algorithm BFair as checking fairness only at task boundaries). Nelissen et al. [20] extended to the case of sporadic job releases. Megel et at. [18] proposed a linear programming formulation to reduce the number of preemptions. Holman and Anderson [13] proposed to offset the time quanta over the processors, to reduce the overhead. Deadline Fair Scheduling (DFS) [10] was proposed to fix some non-work-conserving issues of PFair when implemented on real OSes. In our paper, instead, we take an orthogonal direction: we keep the original PFair time quantization and we instead devise a scheduling algorithm that lowers the lag bounds below 1.
All the mentioned valuable works exploit in different forms the periodic releases (and deadline) of tasks modeled by the execution time Ci and period Ti pair. They achieve optimality by relaxing the stringency of the PFair constraint to always have lag less than one. Motivated by the existence of workloads not bound to periodic releases, our research takes a different direction: how much can we reduce the lag for a feasible schedule? Can we implement such a scheduler in OSes? In the next section, our journey starts with the notation and the system model.
2 System model
We are given n tasks, represented by the integers 1, 2, …, n and we denote the set of tasks by \(\mathcal {T}=\lbrace 1,2,\ldots ,n\rbrace\). We schedule them over unit time intervals. Hence, we represent time by the natural numbers \(\mathbb {N}\) with \(0\in \mathbb {N}\). We schedule tasks over a single machine.
The schedule is represented by a function \(S:\mathbb {N}\rightarrow \mathcal {T}\) with S(t) being the task scheduled at time t. We may use the shorter expression “to schedule at time t” to mean “to schedule over the unit-length dense interval [t, t + 1)”. We may be using an explicit representation of a schedule through the scheduled tasks over time, such as
Notice that the scheduling decision at time b is not accounted in |S(a, b)|i. For example, |S(0, t)|i is the total amount of time allocated to task i over the first t units of time (starting to count from zero as \(0\in \mathbb {N}\)). Obviously
Each task i is associated with a utilization αi representing the fraction of time the task i is scheduled. We assume that αi ∈ (0, 1) and we use α ∈ (0, 1)n to denote (α1, …, αn). The vector α is called the utilization vector. In general, we use the bold font to represent vectors of dimension n. Also, all tasks utilizations are rational numbers \(\alpha _i\in \mathbb {Q}\) (unless specified differently), and we write them as \(\alpha _i=\frac{p_i}{q}\), with \(p_i,q\in \mathbb {N}\) and non-zero.
meaning that the n tasks fully utilize the machine. If α · 1 < 1, we can always add one fictitious task with utilization 1 − α · 1 to fulfill the hypothesis of (2).
A necessary condition for the schedule S is to allocate time in accordance to the utilizations, that is
However, these schedules are not relevant to real-time systems as they imply the presence of arbitrary long intervals [a, b) with task i being never scheduled (with |S(a, b)|i = 0).
In response to the full utilization hypothesis of (2), we assume that the scheduler S never idles, that is
Figure 1: Example of lag for one task with αi = 1/3, with a given schedule S represented in red. The lag is maximum at time 5, when \(\ell _i(5)=\frac{1}{3} 5-1=\frac{2}{3}\), and minimum at time 1 when \(\ell _i(1)=\frac{1}{3} 1-1=-\frac{2}{3}\).
Following a standard terminology [2, 3, 21, 32], the deviation from the ideal schedule which assigns a constant fraction of time αi to task i, is represented by the lag of task i, defined by
and we use ℓ(t) = (ℓ1(t), ℓ2(t), …, ℓn(t)) to represent the lag of all n tasks at time t. The notion of lag is analogous to many other notions in different areas: in combinatorics it is called the discrepancy of a word [1], or in network calculus [7, 15] it is called backlog of a flow. As illustrated in Figure 1, the lag ℓi(t) represents how much the schedule of task i is “late” w.r.t. the ideal “fluid” schedule.
3 Background and motivating example
In real-time systems, only schedulers which can guarantee a bounded lag are of interest. Otherwise, if the lag is not bound, the resource supplied to the task i may diverge from the ideal share αi by an arbitrary large amount. Hence, the following definition.
Definition 3.1
A schedule S has a lag bound L when
\begin{equation}\max _i\sup _t |\ell _i(t)| \le L. \end{equation}
(6)
We remark that it is necessary to take the maximum over i because \(\sup _t |\ell _i(t)|\) may differ task by task.
Also, an elementary property of the lag is that
\begin{equation}\forall t,\quad \boldsymbol {\ell }(t)\cdot \mathbf {1}= \sum _{i=1}^n\ell _i(t)= \!\!\!\overbrace{(\boldsymbol {\alpha }\cdot \mathbf {1})}^{\text{is 1, from (2)}}\!\!\!t - \!\!\!\!\overbrace{\sum _i|S(0,t)|_i}^{\text{is $t$, from (4) and (1)}}\!\!\!\! = 0. \end{equation}
(7)
The most notable lag bound for schedulers is L = 1. In fact, seminal results [2, 3] proposed scheduling algorithms which guarantee the lag to be bounded by L = 1. We remark that from (5) and (6) it follows that when the lag is bounded by L, then
Hence the lag bound L can be interpreted as the proximity of the time |S(0, t)|i allocated to task i to its ideal “fluid” allocation αit. This interpretation justifies our quest for schedules with minimal lag.
When the number of tasks is n = 2, then the intersection of the constraint of (7) and the lag bound of L = 1 gives the segment
is a regular hexagon represented by the gray dashed polygon in Figure 2.
Figure 2:
Figure 2: When n = 3, if the lag is bound by L = 1, the vector \(\boldsymbol {\ell }(t)\in \mathbb {R}^3\) is constrained by Equations (6) and (7) over a regular hexagon (represented by a gray dashed line). The figure also shows starting lag ℓ(0) = (0, 0, 0) as a black dot, and the coordinates of all the vertices of the hexagon.
Figure 2 illustrates both the lag bound of (6) with L = 1 represented by a cube centered at the origin and with edge of length 2, and the constraint of (7).
As tasks are scheduled, the lag coordinates ℓ(t) moves from 0 to other points. Let us now illustrate such a lag dynamics over the hexagon with a simple example of n = 3 tasks with utilizations
Figure 3 illustrates these three vectors over a planar view of the same hexagon of Figure 2, starting from the initial zero lag condition of ℓ(0) = 0. In the figure, the scheduled task S(t) is represented by a move of the current lag ℓ(t) to the next lag ℓ(t + 1) along the corresponding vector, indicated by (10). A schedule S has lag bounded by 1 if all lag points ℓ(t) are always contained in the hexagon represented in Figure 3.
Figure 3:
Figure 3: Representing the variation of the lag over a planar view of the hexagon of Figure 2. When n = 3, the lag bound constraint of (6) with L = 1 requires the lag ℓ(t) to always stay within the regular hexagon (depicted in dashed gray). Depending on the scheduled task, the lag ℓ(t) varies along the red vector (if S(t) = 1), the blue vector (if S(t) = 2), or the green vector (if S(t) = 3), as indicated in (10).
PFair algorithms [2, 3] always generates schedules with lag bounded by L = 1. For example, with the utilizations α of Equation (9) illustrated above, they may generate any of the two schedules
depending on how the (arbitrary) ties are broken at time 3 between task 2 and 3. These two schedules actually have a tighter lag bound than 1. Specifically, the lag of the schedule S is bounded by \(\frac{5}{6}\approx 0.833\), whereas the schedule of S′ is bounded by \(\frac{2}{3}\approx 0.667\). Figure 4 shows the trajectory of the lag ℓ(t) with the schedule S′.
Figure 4:
Figure 4: Trajectory of the lag ℓ(t) with the schedule S′: 1, 2, 1, 3, 1, 2, … (repeated with period 6). The color of the dots represents the scheduled task: red if S(t) = 1, blue if S(t) = 2, green if S(t) = 3 (same colors of Figure 3). From any ℓ(t), the lag varies according to the vector represented in Figure 3 (with the coordinates of Equation (10)), depending on the scheduled task S(t). The segment has the same color of the point it originates from. This represented schedule S′ has a lag bound of \(L= \frac{2}{3}\), which occurs at time t = 5, with \(\boldsymbol {\ell }(5)=\left(-\frac{1}{2},\frac{2}{3},-\frac{1}{6}\right)\).
Motivated by the utility of having a smaller lag bound as shown by (8), next we provide a schedule with smaller lag. Figure 5 illustrates two schedules both yielding a lag bound of \(\frac{1}{2}\). We remark that, to best of our knowledge, the two schedules of Figure 5 below are not generated by any known scheduling algorithm. Instead, S is generated by Algorithm 2, later described in Section 6.
Figure 5:
Figure 5: The trajectory of the lag ℓ(t) for the two schedules with lag bounded by \(L=\frac{1}{2}\).
4 The lag as dynamical system
The same relation of (5) can be expressed by the map
with \(\mathsf {nextLag}(\boldsymbol {\ell },s)\) equal to the lag after task s is scheduled from a state with lag ℓ. Each component \(\mathsf {nextLag}_i(\boldsymbol {\ell },s)\), with i = 1, …, n, is defined by
because, from (5), in one time unit the lag of the i task is always incremented by αi and then it is decremented by one only if task i is scheduled.
We only consider schedulers which take the scheduling decision at time t based on the lag ℓ(t) only. We do so by defining the function \(i^*:\mathbb {R}^n\rightarrow \mathcal {T}\), which returns the task i*(ℓ) to be scheduled when the lag is equal to ℓ. In this case, the schedule function is then set as S(t) = i*(ℓ(t)). Notice that such schedulers are time-invariant: whenever the lag takes the value ℓ, the scheduled task will always be i*(ℓ) regardless of when this is happening.
The composition of
•
the function \(\mathsf {nextLag}(\boldsymbol {\ell },s)\) returning the next lag, and
•
the function i*(ℓ) returning the scheduled task from lag ℓ
enables the definition of the self-map\(M:\mathbb {R}^n\rightarrow \mathbb {R}^n\)
This self-map will be extensively used in the rest of the paper as it is an essential ingredient to study the lag over time. In fact, it allows the following characterization of the lag
from (2) and (7). Because of the above relation, we often use α1 and ℓ1 only, as α2 and ℓ2 immediately follow from (12).
Figure 6:
Figure 6: Representing the self-map M(ℓ1), when \(\alpha _1=\frac{2}{7}\). The task selection function i*(ℓ1) is also represented by the color of the segment: i*(ℓ1) = 1 when M(ℓ1) is blue, on the other hand i*(ℓ1) = 2 when M(ℓ1) is red.
We first express SimPFair through the self-map \(M:\mathbb {R}^2\rightarrow \mathbb {R}^2\) of the lag dynamics and the task selection rule \(i^*:\mathbb {R}^2\rightarrow \lbrace 1,2\rbrace\), as described in Section 4. Notice, however, that since ℓ2 = −ℓ1, we can and do express M(ℓ1) and i*(ℓ1) as function of the task 1 lag ℓ1 only. Hence, M(ℓ1 and i*(ℓ1) of SimPFair are defined as follows:
•
the self-map \(M:[-\frac{1}{2},\frac{1}{2}) \rightarrow [-\frac{1}{2},\frac{1}{2})\) is defined by
The schedule S is then set by S(t) = i*(Mt(0)) for all non-negative integer t, with M0 denoting the identity map, as usual. Both maps M(·) and i*(·) are represented in Figure 6 for an example with \(\alpha _1=\frac{2}{7}\).
The algorithm implementing SimPFair for two tasks is shown in Alg. 1 and it is written as pseudo-code of the self-map M(ℓ1) of (11), also assigning the scheduled task i*(ℓ1). Hence, it is a function that takes the lag ℓ1(t) of task 1 at any given time t as input (remember: when n = 2, the lag ℓ2 of task 2 is always ℓ2 = −ℓ1). Depending on such a value of ℓ1, it determines
•
the task i*(ℓ1) ∈ {1, 2} to be scheduled (at Lines 2 –5), and
•
the lag M(ℓ1) after the task i*(ℓ1) is scheduled (Lines 7 –9).
The first invocation is made by SimPFair(0), since from (11) the initial lag equals zero. In the pseudo-code of Alg. 1, the invocation of Schedule(i*) corresponds to some OS call which performs all the necessary operations to schedule task i* ∈ {1, 2}.
Figure 7:
Figure 7: When α = (2/7, 5/7), Algorithm 1 produces the schedule S: 2, 1, 2, 2, 2, 1, 2, …(repeated with period 7). The figure represents the scheduled task by colors: a red dot represents task 1, a blue one represents task 2. The dots are placed over the lag ℓ1 axis (remember that when n = 2, we have ℓ2 = −ℓ1). Arcs represent the variation of ℓ1 upon scheduling decisions. When task 1 is scheduled (red arcs) they go backward as ℓ1 decreases. When task 2 is scheduled (blue arcs), they point forward as ℓ1 = −ℓ2 increases.
Figure 7 illustrates the iterations when p1 = 2, p2 = 5, q = p1 + p2 = 7, i.e., α = (2/7, 5/7), the same of Figure 6, and the corresponding scheduling decisions. In this case, the lag threshold determining if task 1 or 2 is scheduled is \(\frac{1}{2}-\alpha _1=\frac{3}{14}\) (also depicted in Figures 6 and 7 by a vertical dashed line together with \(-\frac{1}{2}\), 0, and \(\frac{1}{2}\)). At time t = 0, the scheduled task is i*(0) = 2 and the next lag \(\ell _1(1)=M(\ell _1(0))=M(0)=\frac{2}{7}\). Such a lag exceeds the lag threshold, hence \(S(1)=i^*(M(0))=i^*(\frac{2}{7})=1\). This scheduling decision is represented by the first red dot from the left. In this case, the first condition of (13) and the lag decreases to the value of \(M^2(0)=M(\frac{2}{7})=\frac{2}{7}+\frac{2}{7}-1=-\frac{3}{7}\), the leftmost blue dot of the figure. The scheduling decisions and the lag dynamics then proceed as indicated by the arrow in the figure with the resulting schedule being S: 2, 1, 2, 2, 2, 1, 2, … and repeating with period 7. Such a schedule admits a lag bound of \(\frac{3}{7}\), which is strictly tighter than 1. Linking to combinatorics, we note that such a schedule is a “conjugate” of a so-called Christoffel word, in the sense that the schedule 2,2,2,1,2,2,1 (which is a Christoffel word) is obtained by moving the prefix 2,1 of the schedule S at the end. Such words play a significant role in combinatorics and digital geometry [4, 22].
The next theorem states that definitions of M(ℓ1) and i*(ℓ1) of (13) and (14) are correct in the sense that they generate schedules with the assigned utilization values.
Theorem 5.1
Let α = (p1/q, p2/q) be the utilization of two tasks in \(\mathcal {T}=\lbrace 1,2\rbrace\), with q = p1 + p2 and p1, p2 positive integers. Let \(S:\mathbb {N}\rightarrow \lbrace 1,2\rbrace\) be the schedule defined as S(t) = i*(Mt(0)) for all t, with M(·) and i*(·) defined in (13) and (14), respectively.
Then:
•
the schedule S has period q (i.e., S(t + q) = S(t) for all t),
•
the schedule S complies with the task utilizations α, and
•
the lag of the tasks is ℓ(t) = (Mt(0), −Mt(0)) for all t.
Proof.
Let us first prove by induction on t that the lag ℓ1 of task 1 satisfies ℓ1(t) = Mt(0) for all t. When t = 0, the statement ℓ1(0) = M0(0) follows because ℓ1(0) = 0 from the lag definition of (5) and M0(0) = 0 as M0(·) is the identity map. From (13) it follows that, for any \(t \in {\mathbb {N}}\), Mt + 1(0) − Mt(0) ∈ {α1 − 1, α1}, with Mt + 1(0) − Mt(0) = α1 − 1 if and only if i*(Mt(0)) = 1 by (14). Furthermore, by the property of (11), the lag ℓ1(t) satisfies ℓ1(t + 1) − ℓ1(t) = α1 − 1 if S(t) = 1, and ℓ1(t + 1) − ℓ1(t) = α1, otherwise. Since the schedule is built by setting S(t) = i*(Mt(0)), we find that
with the last inequality holding because the map M takes its values in the set [ − 1/2, 1/2). The expression p1 − |S(0, q)|1 above must be integer and only possible integer is 0. Hence, |S(0, q)|1 = p1, and necessarily |S(0, q)|2 = q − p1 = p2, proving that the schedule S complies with the utilization α.
Lastly, by (15), from |Mq(0)| ≤ 1/2 and since ℓ1(q) takes integer values, we get Mq(0) = 0, meaning that after q time units, the schedule S repeats. □
We are now showing that no schedule exists with a lag smaller than the one of SimPFair. For this purpose, for any given real number \(x\in \mathbb {R}\), we use ⌊x⌋/q to denote the largest rational number with denominator q smaller than x, that is,
and \(\frac{1}{q}\mathbb {Z}=\lbrace \frac{x}{q}:x\in \mathbb {Z}\rbrace\) for the set of all rational numbers with denominator q, possibly reducible. For example
We now find the tightest lag bound for the schedule S. This schedule coincides with the schedule described in [8]. We recover in particular [8, Theorem 9] for the expressions of the tightest lag bound \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}\) expressed in next statement.
Theorem 5.2
Let α = (p1/q, p2/q) be the utilization of two tasks, with q = p1 + p2 and p1, p2 positive integers. We also assume that p1 and p2 are coprime. Let the schedule S be defined by S(t) = i*(Mt(0)) for all t.
Then
•
the schedule S has a lag bound of \(L= \left \lfloor \frac{1}{2} \right \rfloor _{/q}\), and even \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}= \max _t|\ell _1(t)|\);
•
the lag bound L is tight, i.e., no schedule with lag bound smaller than \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}\) exists.
In words, this theorem
•
determines the tightest lag bound for schedules S generated by SimPFair and
•
shows that no other schedule with lower lag may exist.
Proof of Theorem 5.2
By Theorem 5.1, the lag bound L of the schedule S is smaller than or equal to 1/2 since ℓ(t) = (Mt(0), −Mt(0)) for all t and M takes its values in [ − 1/2, 1/2).
Moreover, \(M^t(0)\in \frac{1}{q}\mathbb {Z}\) for all t. Indeed, we prove it by induction by recalling that \(M(0)=0\in \frac{1}{q}\mathbb {Z}\), and that \(M^{t+1}(0)-M^t(0) \in \lbrace \frac{p_1}{q}, \frac{p_1}{q}-1\rbrace \subset \frac{1}{q}\mathbb {Z}\) for all t. This gives
which implies in particular that \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}\) is a lag bound for S. Note also that
\[\begin{eqnarray*}\lbrace M^t(0) : 0 \le t \lt q\rbrace = \lbrace M^t(0) : 0 \le t \in {\mathbb {N}}\rbrace\end{eqnarray*}\]
since M has period q.
Let us now prove that the map M is invertible (see Figure 6 for an illustration). Note first that the map M takes finitely many values by (17). Hence it is sufficient to prove that M is surjective to deduce that it is invertible. Observe also that
Now, if ℓ ∈ [α1 − 1/2, 1/2), then ℓ′ ≔ ℓ − α1 is such that − 1/2 ≤ ℓ′ < 1/2 − α1 and M(ℓ′) = ℓ; similarly, if ℓ ∈ [ − 1/2, α1 − 1/2), then ℓ′ ≔ ℓ − α1 + 1 is such that 1/2 − α1 ≤ ℓ′ < 1/2 and M(ℓ′) = ℓ. Thus proves that M is surjective and thus invertible.
The next step is to prove that if Mt(0) = 0 for some t, with 0 < t ≤ p1 + p2 = q, then t = p1 + p2 = q. Let t with 0 < t ≤ p1 + p2 = q be such that Mt(0) = 0. One has
Since \(\gcd (p_1,p_2)=1\), this implies that |S(0, t)|1 = kp1 and |S(0, t)|2 = kp2 for some positive integer k. This also implies that t = |S(0, t)|1 + |S(0, t)|2 = k(p1 + p2). However, from 0 < t ≤ p1 + p2 = q, it can only be t = p1 + p2 = q, as required.
We deduce in particular that the points Mt(0) are distinct for 0 ≤ t < p1 + p2. Indeed, if Mt(0) = Ms(0) for some s, t, with 0 ≤ s ≤ t < p1 + p2, then Mt − s(0) = 0, since the map M is invertible; but then s = t, from what precedes.
This implies that the inclusion from (17) is in fact an equality:
Indeed there are exactly q rational points with denominator q in the interval [ − 1/2, 1/2). Also, the q points Mt(0), for 0 ≤ t < q are all distinct. We then deduce from (18) that \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}= \max _t|\ell _1(t)|\).
It remains to prove the tightness for the lag bound: no schedule exists with lower lag. Consider a schedule S′ of tasks with utilizations α. We denote its lag by \(\ell _1^{\prime }(t)\) and we suppose \(|\ell _1^{\prime }(t)|\le L \lt \left \lfloor \frac{1}{2} \right \rfloor _{/q}\). The set of all possible lag values \(\mathbb {L}=\lbrace \ell _1^{\prime }(t):\forall t\in \mathbb {N}\rbrace\) is a subset of \(\frac{1}{q}\mathbb {Z}\) because, by induction on t, \(\ell _1^{\prime }(0)=0\in \frac{1}{q}\mathbb {Z}\) and \(\ell _1^{\prime }(t+1)-\ell _1^{\prime }(t)\in \lbrace \frac{p_1}{q},\frac{p_1}{q}-1\rbrace \subset \frac{1}{q}\mathbb {Z}\). Hence, the number of all possible lag values in \(\mathbb {L}\) is less than q, because the choice of \(L\lt \left \lfloor \frac{1}{2} \right \rfloor _{/q}\) implies that \(\mathbb {L}\subset [-L,L]\cap \frac{1}{q}\mathbb {Z}\).
Since \(\mathbb {L}\) is finite, there must be two distinct time instants with the same lag. Let \(a,b\in \mathbb {N}\), a < b be such that \(\ell _1^{\prime }(a)=\ell _1^{\prime }(b)\). From the dynamics of the lag of (11)
from which \(|S^{\prime }(a,b)|_1=\frac{p_1}{q}(b-a)\). Since |S′(a, b)|1 is integer and p1, q are coprime, (b − a) must be a positive multiple of q, which contradicts the distinct lags \(\ell _1^{\prime }(t)\) with t ∈ [a, b) and the cardinality of \(\mathbb {L}\) less than q. Hence, no schedule S′ with lag bound smaller than \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}\) may exist. □
Hence, the SimPFair finds the schedule of two tasks with the lowest possible lag, which is:
•
if q is even, the lag bound L equals 1/2, and
•
if q is odd, then \(L= \left \lfloor \frac{1}{2} \right \rfloor _{/q}=\frac{q-1}{2q} \lt 1/2\).
has the same lag bound of S′. Hence, the minimum lag bound must occur for two tasks. From Th. 5.2, the lag bound is \(\left \lfloor \frac{1}{2} \right \rfloor _{/q}=\frac{q-1}{2q}\), and it takes its minimum value of \(\frac{1}{3}\) when q = 3. When q = 3, the utilization of the two tasks are \(\frac{2}{3}\) and \(\frac{1}{3}\) and the schedule with such a minimal lag is S: 1, 2, 1, 1, 2, 1, … □
We remark that SimPFair takes the scheduling decision at very low computational cost. The only mathematical operations required by our Algorithm 1 are the addition of Line 7 and possibly the subtraction of Line 9, hence with no division and no ceiling operation as, for example, standard PFair scheduling rules [2, 3].
These computational advantages of Alg. 1 are even more striking in C language, which is the standard language for OSes. Listing 1 shows the implementation in C. In such implementation:
•
We assume that the denominator q of both α1 and α2 is 232, with integers represented in 32 bits. This assumption introduces an error in bandwidth allocation in the order of \(\frac{1}{2^{32}} \lt 10^{-9}\), which seems acceptable;
•
We use p1 = α1q to represent the bandwidth over the unsigned integers with the variable p1. For example, if \(\alpha _1=\frac{3}{4}\) then p1=0xC0000000, or if \(\alpha _1=\frac{1}{3}\) then p1=0x55555555;
•
The lag1 parameter is the representation on the signed integers of q ℓ1, the lag ℓ1 multiplied by q (equal to 232 in this case);
•
The variable i_star corresponds to task selection rule i* in Alg. 1 and represents the ID of the task to be scheduled. The expression of the conditional assignment of Line 8 in Listing 1 comes from multiplying by q = 232 the condition of Line 2 of Alg. 1. Notice that in two’s complement the macro INT_MIN equal to − 231 has the same representation of \(\frac{q}{2}=2^{31}\), which is the translation of \(\frac{1}{2}\) at Line 2 of Alg. 1 into Listing 1;
•
Finally, lag1 is simply incremented by p1, which is what we get by multiplying Line 7 by q. Remarkably, in C/Assembly there is no need to conditionally subtract q from lag1 as required at Line 9. In fact, q equals 232 and subtracting 232 on 32-bits integers has no effect. The possibly overflowed value of lag1+p1 is exactly what we would achieve by subtracting q = 232 on a wider representation.
6 SimPFair: the general case
6.1 Description of the algorithm
The general version of the SimPFair algorithm, which can be used with any number n of tasks, is described in Alg. 2. The algorithm depends on a fixed constant L and can guarantee a schedule with lag bounded by L. We will discuss later how we can choose a valid value for L.
Let us now describe the algorithm. For any tuple ℓ = (ℓ1,..., ℓn) of tasks’ lag at any given time, at Line 2 it is computed the set of candidate task \(\mathcal {I}(\boldsymbol {\ell }) \subseteq \mathcal {T}\) defined by
which may be scheduled. The set of tasks \(\mathcal {I}(\boldsymbol {\ell })\) guarantees that whichever task in \(\mathcal {I}(\boldsymbol {\ell })\) is chosen at time t, its lag ℓi(t + 1) at time t + 1 will never be below − L, hence preserving the bound L from below for ℓ(t).
Among the tasks in \(\mathcal {I}(\boldsymbol {\ell })\), the one chosen to be scheduled i*(ℓ) is defined as
In words, the algorithm picks among the possible candidates tasks in \(\mathcal {I}(\boldsymbol {\ell })\), the one that would first reach the upper bound L, by considering that task i progresses at speed αi. The schedule is then built from i* by setting S(t) = i*(ℓ(t)).
Algorithm 1 presented earlier for n = 2, however, does have a different form than Alg. 2. Hence, we first show that the two algorithms coincide when
This shows that for two tasks, the set \(\mathcal {I}(\boldsymbol {\ell })\) of eligible tasks to be scheduled of Equation (19), coincides with the task selection criterion at Line 2 of Alg. 1 when \(L=\frac{1}{2}\). Hence, we consider Algorithm 2 only for n ≥ 3.
A characteristic of Algorithm 2 is the requirement of the specification of a lag bound L. It is therefore natural to investigate the minimum value of L that ensures the set \(\mathcal {I}\) is never empty. In fact, by arbitrarily lowering L the set \(\mathcal {I}\) may indeed be empty making the algorithm unable to select any task to be scheduled. For any n ≥ 3, we propose to choose
which was shown to be a valid lag bound [5, Proposition 3.8]. Such a bound refines the lag bound \(1 - \frac{1}{2n-2}\) from [27] (considered in the setting of assignment problems) which was proved to guarantee the existence of a valid schedule S.
Lemma 6.1
Let n ≥ 3. By choosing L as indicated in (21), the set \(\mathcal {I}(\boldsymbol {\ell })\) is not empty at each step.
because \(\min _i\alpha _i\le \frac{1}{n}\) which in turn is not larger than \(\frac{1}{3}\) as n ≥ 3. When n = 3, the RHS above is equal to \(\frac{1}{n}\) (both the RHS and \(\frac{1}{n}\) take the value of \(\frac{1}{3}\)). For n > 3, \(\frac{1}{n}\) decreases more slowly than \(\frac{1+\frac{1}{3}}{2n-2}\), hence
Hence, (21) implies that L ≥ 1 − 1/n, which in turn yields that the set \(\mathcal {I}(\boldsymbol {\ell })\) is not empty at each step. Otherwise, we would have ℓi < 1 − αi − L, for all i. And by summing such inequality over all i, we find \(\sum _{i=1}^n \ell _i \lt n -1 -nL \le 0\), which contradicts the fact that \(\sum _{i=1}^n \ell _i =0\) and L ≥ 1 − 1/n. □
Regarding the time complexity, SimPFair is O(n).
Similarly as in the two task case, the algorithm SimPFair of Alg. 2 can also be viewed as a self-map M (that depends on the choice of α and L) acting on the finite set of rational points \(\frac{1}{q} \mathbb {Z}^n \cap [-L,L]^n\). At time t = 0, we start with zero lag ℓ(0) = 0. As indicated in the lag dynamics of (11), at each step t the next lag is determined by ℓ(t + 1) = M(ℓ(t)) with the i-th component Mi of the self-map M of (11) defined by
with i*(ℓ) denoting the task to be scheduled when the lag is ℓ.
Next theorem demonstrates that SimPFair “does the job”: it produces a schedule S with the given task utilizations.
Theorem 6.2
Let α = (p1/q, …, pn/q), with p1, …, pn, q being positive integers.
Let \(S: {\mathbb {N}} \rightarrow \lbrace 1,2, \dots , n\rbrace\) be the schedule defined by S(t) = i*(Mt(0)) for all t. This schedule has period q and utilizations vector equal to α. The lag ℓ of the schedule S satisfies ℓ(t) = Mt(0) for all t. Moreover, the schedule S has lag bounded by ⌊L⌋/q with L from (21).
The proof of this theorem follows the same steps of 5.1 by extending to the case of n tasks, instead of 2. Since it offers a limited added value, we only report a proof sketch.
Proof sketch
Since L < 1, similarly as in the proof of Theorem 5.1, the schedule S(t) has utilization α = (p1/q, …, pn/q), and in [0, q) the task i is scheduled pi times, for each i = 1, …, n.
Hence, for each i, the ith coordinate of Mq(0) is equal to 0, since it is given by
We then prove by induction that the lag ℓ of the schedule S satisfies ℓ(t) = Mt(0) for all t.
Note also that the values Mt(0) belong to \(\frac{1}{q} {\mathbb {Z}}^n \cap [-L,L]^n.\) Since ℓ(t) = Mt(0) for all t, we get that ⌊L⌋/q is a lag bound. □
The general validity of the lag bound of (21) was proved by Berthé et al. [5]. When SimPFair is fed with such a lag bound, it can produce a feasible schedule. However, some special utilizations α can produce even a smaller lag, as shown in next example.
The algorithm SimpPFair applied to the utilization vector α = produces a schedule S that has a very specific form. For instance, when n = 3, one gets S = 1, 2, 1, 3, 1, 2, 1, …, with α = (4/7, 2/7, 1/7). When n = 4, one gets S = 1, 2, 1, 3, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 1, …, with α = (8/15, 4/14, 2/15, 1/15). The lag bound \(L=\frac{2^{n-1}-1}{2^n-1}\) is 1/3 when n = 2, then 3/7, 7/15, and so on, tending to \(\frac{1}{2}\) from below as n grows. Such a lag bound was found and proved to be tight [8, Proposition 7]. More surprisingly, it was proved [9] that when n ≥ 3, if a schedule S admits a lag bound L smaller than 1/2, then the only possible task utilizations are the ones of (22). Note that this specific utilization vector occurs in the setting of the so-called Fraenkel’s conjecture stated in number theory (see e.g. [12, 28]).
6.2 A tightness result
Motivated by the previous example, which shows a case with a considerably lower lag, and following the logic of Theorem 5.2, which demonstrates the tightness of the lag bound, one might consider it possible to demonstrate an analogous result for the general case of n ≥ 3. Specifically, next theorem shows that the chosen lag bound L of (21) is the best bound for any choice of rational utilization α.
Theorem 6.4
For every ε > 0 with \(\varepsilon \lt \frac{1}{2}\), there exist rational utilizations \(\boldsymbol {\alpha }\in \mathbb {Q}^n\), with αi ∈ (0, 1) and \(\sum _{i=1} ^n \alpha _i=1\), such that no schedule S exists with lag bound satisfying
The proof is by contradiction. We will construct a particular utilization vector α and exhibit a particular time t. The assumption of the lag bound L satisfying (23) will then lead to a contradiction by comparing the schedule at times t and t + n − 2.
Let us fix ε > 0 with \(\varepsilon \lt \frac{1}{2(n-1)}\). We first fix \(q\in \mathbb {N}\) large enough to guarantee the existence of both a positive \(r\in \mathbb {N}\) such that
This is always possible due to the density of \(\mathbb {Q}\). The integer p is used next to define the utilization vector α. The integer r will be used later to define a suitable time t.
The above choice is made to have the first n − 1 utilizations close to \(\frac{1}{n-1}\) (from (25)) and \(\alpha _n=1- (n-1)\frac{p}{q} \lt \varepsilon\) (again, from (25)) “small enough”.
We now have to exhibit a suitable time t. For any \(t \in {\mathbb {N}}\) and for any i = 1, …, n, we introduce the notation
•
ri, t to denote the remainder in the Euclidean division of t · pi by q and
•
di, t for the quotient of the same Euclidean division, that is di, t = ⌊tpi/q⌋.
with \(d_{i,t}\in {\mathbb {N}}\) and \(\frac{r_{i,t}}{q} \in [0,1)\). Moreover, by the choice of α of (26), one has ri, t = rj, t and di, t = dj, t for all 1 ≤ i, j ≤ n − 1 and for each t.
We now fix the remainder ri, t as follows
\begin{equation}r_{i,t}=r, \ 1\le i \le n-1, \end{equation}
(28)
with r satisfying (24). This means that we choose t such that
Such an integer t always exists since p and q are coprime. Note that ri, t denotes the generic remainder for any t, whereas r is the desired remainder satisfying (24), which leads to the contradiction. Equating r and ri, t provides t.
Let us prove that for the above choices, one has \(\sum _{i=1}^n \frac{r_{i,t}}{q}=1\). We need this intermediate step in our way to find a contradiction. Since \(\sum _{i=1}^n \alpha _i =\sum _{i=1}^n \frac{p_i}{q} =1\), by summing Eq (27) among all i we find
Note that we used \(r_{n,t}\lt q\ \Rightarrow \ \frac{r_{n,t}}{q}\lt 1\) above. Hence, since \(\sum _{i=1}^n \frac{r_{i,t}}{q}\) is an integer in [0,2), we deduce
We now by prove contradiction that no schedule S with utilization α can have a lag bound L satisfying (23). Let assume such a schedule S exist. First observe that since L < 1, one has either
\begin{equation}|S(0,t)|_i =d_{i,t}, \mbox{ or } 1+d_{i,t}. \end{equation}
If |S(0, t)|i = di, t + 1, then |S(0, t)|i − tαi > L, a contradiction with the fact that L is a lag bound. Consequently
\begin{equation}|S(0,t)|_i =d_{i,t}, \mbox{ for } 1\le i \le n-1. \end{equation}
(31)
We now compare the schedule S at times t and t + n − 2.
First consider what happens at time t + n − 2 for the n − 1 first tasks i, with 1 ≤ i ≤ n − 1. One has αi = p/q. Let us compare quotient and remainder of the Euclidean division by q (such as in (27)) for tp + (n − 2)p, with the ones of tp. According to the RHS of (24) and (25), one has
\[\begin{eqnarray*}0\le r +(n-2) p \lt \frac{1+\varepsilon }{2(n-1)}q+(n-2) \frac{1}{n-1}q \lt q.\end{eqnarray*}\]
This shows that both quotients of tp + (n − 2)p and tp of the Euclidean division by q are the same. Thus, for i = 1, …, n − 1
The tightness result for n = 2 tasks of Theorem 5.2 is stronger, as it gives the tightest bound for a specific choice of α. The just proved Theorem 6.4, instead, proves tightness for any choice of α and thus involves a higher lag bound. A similar result was due to Tijdeman [26] for irrational utilizations α. Since Theorem 6.4 is demonstrated for rational utilizations, we implicitly show that there is no gain in choosing rational utilizations w.r.t. the irrational case.
7 Conclusion and future works
In this paper, we have investigated algorithms to achieve fairness with lower lag than known ones. One application-oriented future direction is the implementation on real kernels. In fact, the extreme simplicity of the code in Listing 1 is promising. Other theoretical aspects, such as the extension to the multiprocessor case, are also of our interest.
Acknowledgments
This work is partially supported by the project “Trustworthy Cyber-Physical Pipelines”, funded by the MAECI Italy-Sweden co-operation id. PGR02086, and the spoke “FutureHPC and BigData” of the ICSC — Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing funded by European Union — NextGenerationEU. It is also partially supported by the Agence Nationale de la Recherche through the project “SymDynAr” (ANR-23-CE40-0024-01).
References
[1]
Boris Adamczewski. 2003. Balances for fixed points of primitive substitutions. Theoret. Comput. Sci. 307, 1 (2003), 47–75. Words.
James H. Anderson and Anand Srinivasan. 2000. Early-release fair scheduling. In Proceedings 12th Euromicro Conference on Real-Time Systems. Euromicro RTS 2000. IEEE, 35–43.
Sanjoy K. Baruah, Neil K. Cohen, Greg Plaxton, and Donald A. Varvel. 1996. Proportionate Progress: A Notion of Fairness in Resource Allocation. Algorithmica 15, 6 (jun 1996), 600–625.
Jean Berstel, Aaron Lauve, Christophe Reutenauer, and Franco V. Saliola. 2008. Combinatorics on Words: Christoffel Words and Repetitions in Words. Vol. 27. American Mathematical Society. xii+147 pages.
Enrico Bini. 2016. Adaptive fair scheduler: Fairness in presence of disturbances. In Proceedings of the 24th International Conference on Real-Time Networks and Systems. ACM, 129–138.
Nadia Brauner and Vincent Jost. 2008. Small deviations, JIT sequencing and symmetric case of Fraenkel’s conjecture. Discrete Math. 308, 11 (2008), 2319–2324.
Abhishek Chandra, Micah Adler, and Prashant Shenoy. 2001. Deadline fair scheduling: bridging the theory and practice of proportionate pair scheduling in multiprocessor systems. In Proceedings of the 7th IEEE Real-Time Technology and Applications Symposium (RTAS 2001), 30 May - 1 June 2001, Taipei, Taiwan. IEEE, 3–14.
Alan J. Demers, Srinivasan Keshav, and Scott Shenker. 1989. Analysis and simulation of a fair queueing algorithm. ACM SIGCOMM Computer Communication Review 19, 4 (1989), 1–12.
Manolis Katevenis, Stefanos Sidiropoulos, and Costas Courcoubetis. 1991. Weighted round-robin cell multiplexing in a general-purpose ATM switch chip. IEEE Journal on selected Areas in Communications 9, 8 (1991), 1265–1279.
Luciano Lenzini, Enzo Mingozzi, and Giovanni Stea. 2002. Aliquem: a novel DRR implementation to achieve better latency and fairness at O (1) complexity. In IEEE 2002 Tenth IEEE International Workshop on Quality of Service (Cat. No. 02EX564). IEEE, 77–86.
Chung Laung Liu and James W Layland. 1973. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM (JACM) 20, 1 (1973), 46–61.
Thomas Megel, Renaud Sirdey, and Vincent David. 2010. Minimizing task preemptions and migrations in multiprocessor optimal real-time schedules. In Proceedings of the 31st IEEE Real-Time Systems Symposium, RTSS 2010, San Diego, California, USA, November 30 - December 3, 2010. IEEE, 37–46.
Geoffrey Nelissen, Hang Su, Yifeng Guo, Dakai Zhu, Vincent Nélis, and Joël Goossens. 2014. An optimal boundary fair scheduling. Real-Time Systems 50, 4 (2014), 456–508.
Abhay K. Parekh and Robert G. Gallager. 1993. A generalized processor sharing approach to flow control in integrated services networks: the single-node case. IEEE/ACM transactions on networking 1, 3 (1993), 344–357.
Madhavapeddi Shreedhar and George Varghese. 1996. Efficient fair queuing using deficit round-robin. IEEE/ACM Transactions on networking 4, 3 (1996), 375–385.
George Steiner and Julian Scott Yeomans. 1996. Optimal level schedules in mixed-model, multi-level JIT assembly systems with pegging. European Journal of Operational Research 95, 1 (November 1996), 38–52.
Seyed Mohammadhossein Tabatabaee, Jean-Yves Le Boudec, and Marc Boyer. 2021. Interleaved weighted round-robin: A network calculus analysis. IEICE Transactions on Communications 104, 12 (2021), 1479–1493.
Yao-Tzung Wang, Tzung-Pao Lin, and Kuo-Chung Gan. 1994. An improved scheduling algorithm for weighted round-robin cell multiplexing in an ATM switch. In Proceedings of ICC/SUPERCOMM’94-1994 International Conference on Communications. IEEE, 1032–1037.
Jianjia Wu, Jyh-Charn Liu, and Wei Zhao. 2007. Utilization-bound based schedulability analysis of weighted round robin schedulers. In 28th IEEE International Real-Time Systems Symposium (RTSS 2007). IEEE, 435–446.
Dakai Zhu, Daniel Mossé, and Rami Melhem. 2003. Multiple-resource periodic scheduling problem: how much fairness is necessary?. In Proceedings of the 24th IEEE Real-Time Systems Symposium (RTSS 2003), 3-5 December 2003, Cancun, Mexico. IEEE, 142–151.
Scheduling tasks close to their data and optimising resources utilisation are both crucial for the efficiency of MapReduce system. On the other hand, there is a conflict between fairness and efficiency. In this study, an efficient and dominant resource ...
AIVR 2018: Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality
Apache Hadoop is an open source framework that implements MapReduce. It is scalable, reliable, and fault tolerant. Scheduling is an important process in Hadoop MapReduce. It is because scheduling has responsibility to allocate resources for running ...
MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different ...