1 Introduction

Due to their applicability, Dial-a-Ride Problems (DARP) have been studied from the perspective of operations research, management science, combinatorial optimization, and theoretical computer science. There are numerous variants, but fundamentally all DARP variants require the scheduling of vehicle routes to satisfy a collection of pickup and delivery requests, or rides, from specified origins to specified destinations. Each ride can be viewed as a request between two points in an underlying metric space, with the ride originating at a source and terminating at a destination. These requests may be restricted so that they must be served within a specified time window, they may have weights associated with them, details about them may be known in advance or only when they become available, and there may be various metrics to optimize. For most variants the goal is to find a schedule that will allow the vehicle(s) to serve requests within the constraints, while meeting a specified objective. Much of the motivation for DARP arises from the numerous practical applications of the transport of both people and goods, including delivery services, ambulances, ride-sharing services, and paratransit services.

We study offline DARP on the uniform metric (i.e. where the distance between any pair of locations is the same for all pairs) with a single server, or vehicle, of unit capacity where each request has a source and destination. The server has a specified deadline after which no more requests may be served, and the goal is to schedule a maximum-cardinality subset of requests to serve. We note that a more general version of DARP allows for each request to have an associated revenue with the goal of maximizing total revenue earned; our problem is equivalent to the setting with uniform revenues, so we refer to it as URDARP.

This form of the problem is applicable, for example, in urban settings where it is reasonable to assume that a driver would like to serve as many requests as possible and these requests take roughly the same amount of time to serve [3]. We found that even this fundamental variant is in fact NP-hard, and its analysis elusive, but essential for extending to more general versions.

For example, we have found that the natural DARP algorithm for the nonuniform revenue and uniform metric variant, that greedily chooses the highest-revenue request to serve in each iteration, can be at best a 1/2-approximation. Lemma 12 in Section 5 details what happens when the greedy strategy is based on largest revenue. We have found a similar outcome for the variant of DARP on a non-uniform metric with uniform revenues; Lemma 13 in Section 5 details what happens when the greedy strategy is based on shortest request.

We therefore consider the uniform-revenue variant on the uniform metric, and study algorithms that give preference to sequences of requests that are “chained” together, i.e., the destination of any (non-final) request in the sequence is the source of the next request. In particular, we consider an algorithm we call \(\textsc {twochain}\) that gives preference to requests that are in chains of length at least two. We focus on this algorithm and URDARP because, with an understanding of this fundamental setting, we can then work to break the barrier of 1/2 in the setting with general revenues.

In sum, the focus of this work is on offline URDARP (i.e. DARP with a single-unit-capacity vehicle, on the uniform metric, with uniform revenues). We first enhance the conference version [1] of this work by more fully placing the problem in a larger context, expanding the discussion of related work. We then begin the technical contribution by showing even this basic form of the problem we are studying is NP-hard by a reduction from the Hamiltonian Path problem. We then show that our \(\textsc {twochain}\) algorithm yields a 2/3-approximation, and exhibit an instance where \(\textsc {twochain}\) serves exactly 2/3 the optimal number of requests. Since \(\textsc {twochain}\) yields a tight 2/3-approximation, we initially expected that the natural generalization of the algorithm, an algorithm we call k-chain, would yield a \(k/(k+1)\)-approximation. Surprisingly, it does not. We exhibit an instance of URDARP where k-chain serves no more than 7/9 of the number of requests served by the optimal solution. We follow that with a discussion of how even the (non polynomial-time) algorithm that greedily chooses the longest chain, which we refer to as longest-chain-first (\(\textsc {lcf}\)), gives at most a 5/6-approximation. We extend our earlier conference paper [1] with additional results on the \(\textsc {lcf}\) algorithm, and a new section of experimental work, comparing the performance of k-chain for varying k with a polynomial-time variant of \(\textsc {lcf}\). We further add a general upperbound of 1/2 for the competitive ratio of any deterministic online algorithm in our concluding remarks, as we look toward online and non-uniform variants of this problem.

1.1 Related Work

DARP has been extensively studied, with numerous variants, including the number of vehicles, the objectives, the presence of time windows, and how the request sequence is issued (i.e. offline or online). The 2007 survey The dial-a-ride problem: models and algorithms [4] provides an overview of some of the models and algorithms, including heuristics, that have been studied. A decade later, Typology and literature review for dial-a-ride problems [5] focuses on classifying the existing literature based upon applicability to particular real-world problems, again including both algorithms with theoretical guarantees and heuristics. And a 2018 survey [6] catalogues the many DARP variants that have been studied mainly in the operations research and transportation science domains. To our knowledge, despite its relevance to modern-day transportation systems, aside from our work in  [1], the version of the problem we investigate here, URDARP, has not been previously studied in the literature, neither for the uniform nor general metric space.

Initially restricting the space of investigation to the uniform metric is an approach that has been taken by many prior works in a number of areas, and has often subsequently led to more general results. Such works include those studying the k-server problem [7,8,9], metrical task systems [10], buffer sorting [11,12,13], minimum bipartite matching [14,15,16,17], facility location [18], and computational geometry [19].

Approximation algorithms for standard offline variants of DARP problems have been known for decades. For example, when minimizing the cost to serve all requests, Frederickson et al. [20] gave a 9/5 approximation algorithmFootnote 1 for DARP on general graphs with unit server capacity, which was later improved to 5/4 on trees by Frederickson and Guan [21]. For servers with capacity c, Charikar and Raghavachari [22] gave a (\(\sqrt{c}\log n\log \log n\))-approximation algorithm, as well as a 2-approximation for the case of line metrics. Gupta et al. [23] later gave a different algorithm with a similar approximation for the problem. Approximation algorithms have not been the only approach taken for DARP problems: in a caterpillar graph when the locations of requests are chosen uniformly at random, Coja-Oghlan et al. [24] found that a MST heuristic optimally solves the problem with high probability. Given the numerous variants of DARP and the lack of a one-size-fits-all solution, we now highlight some variants that are most similar to the problem we investigate.

For the variant of DARP where the goal is to maximize the number of requests served before a deadline (as it is with URDARP), [25] has shown that the Segmented Best Path algorithm of [26, 27] is a 4-approximation on non-uniform metric spaces. That work also shows that a greedy algorithm that repeatedly serves the fastest set of k remaining requests has an approximation ratio of \(2+\lceil \lambda \rceil /k\), where \(\lambda \) denotes the ratio between the maximum and minimum distances between nodes.

Blum et al. [28] gave the first constant-factor approximation algorithm for the Orienteering Problem, where the input is a weighted graph with rewards on nodes, and the goal is to find a path that, starting at a specified origin, maximizes the total reward collected, subject to a limit on the path length. This constraint on path length is analogous to the deadline constraint we have in our model of URDARP, so the Orienteering Problem is equivalent to assuming each URDARP request has its source node equal to its destination node.

URDARP is also closely related to the Prize Collecting Traveling Salesperson Problem (PCTSP) where the server earns a revenue (or prize) for every location it visits and a penalty for every location it misses, but the goal in PCTSP is to collect a specified amount of revenue while minimizing travel costs and penalties. PCTSP was introduced by Balas [29], and Bienstock et al. [30] gave the first approximation algorithm for it, with ratio 5/2. Later, Goemans and Williamson [31] developed a primal-dual 2-approximation algorithm. More recently, building off of the work in [31], Archer et al. [32] improved the ratio to \(2-\epsilon \), a significant result as the barrier of 2 was thought to be unbreakable [33].

Bienstock et al. developed a 2-approximation for a version of PCTSP where there is a cost for each edge and a penalty for each vertex, and the goal is to find a tour on a subset of the vertices that minimizes the sum of the cost of the edges in the tour and the vertices not in the tour [30].

2 Preliminaries

The input to URDARP is a uniform metric space, a set of requests, and a time limit T. Each request has a source point and a destination point in the metric space. A unit capacity server, or vehicle, starts at a designated location in the metric space, the origin. The goal is to move the server through the metric space, serving requests one at a time so as to maximize the number of requests served in T time units. For an URDARP instance I, \(\textsc {opt}(I)\) denotes an optimal schedule on I.

We refer to a move from one location to another as a drive. If a request is being served then we refer to it as a service drive (sometimes referred to in the literature as a carrying move). If the drive is solely for the purpose of moving the server from one location to another and not serving a request we refer to it as an empty drive (sometimes referred to in the literature as an empty move). We refer to a sequence of one or more requests that are served without any intermediary empty drives as a chain and a sequence of two requests that are served without an empty drive in between as a 2-chain.

At first glance, the problem is simply one of chaining requests cleverly. If one request’s destination is the source of another request, they can be served consecutively with no wasted time. When this is not the case, any algorithm must use one unit of time to travel before serving the request. Hence, any solution to the problem consists of sequences of chains of served requests, separated by (unit) travel time.

2.1 Hardness

While it was already shown in [34] that DARP on a general metric with general revenues is NP-hard, we now show that even URDARP, where the metric is uniform and the requests have uniform revenue, is NP-hard by reduction from the Hamiltonian Path problem. The reduction proceeds as follows.

Fig. 1
figure 1

An example instance G of the Hamiltonian Path problem where \(n=5\) (left), and its corresponding instance for URDARP where \(T=2n+1\) (right). Note that both graphs are complete but for simplicity, we show only relevant edges. Any Hamiltonian path on a graph of n vertices has length \(n-1\), which would correspond to a sequence in the corresponding URDARP instance of \(2n-1\) requests (since a URDARP request is created for each vertex and each edge of G). Though t may seem unnecessary, without the added edges from each point \(v''\) to t in the URDARP instance, sequences of length \(2n-1\) do not guarantee the existence of a Hamiltonian path in G. In this figure, although G lacks a Hamiltonian path, the URDARP instance still has a sequence of requests of length \(2n-1\): \(a''\rightarrow e' \rightarrow e'' \rightarrow b' \rightarrow b'' \rightarrow a' \rightarrow a'' \rightarrow c' \rightarrow c'' \rightarrow b'\). The added edges to t ensure that any Hamiltonian path in G will in fact correspond to a URDARP sequence of length 2n rather than \(2n-1\), preventing such false positives

Given a directed Hamiltonian Path input \(G = (V,E)\) where \(n = \vert V\vert \), build a uniform metric space \(G'\) with \(2n+2\) points (see Fig. 1): one point will be the server origin o, one will be a designated “sink” point t, and the other 2n points are as follows. For each node \(v \in V\), create a point \(v'\) and a point \(v''\) in G’. Create a URDARP request in \(G'\) from point \(v'\) to point \(v''\) for each \(v \in V\), which we will refer to as a node request. Further, for each edge \(e = (u,v)\) in E of G, create a URDARP request from point \(u''\) to point \(v'\) in \(G'\), which we will refer to as an edge request. Additionally, for each \(v \in V\), create an edge request from \(v''\) to the designated sink point t in G’. Set \(T = 2n + 1\). Finally, make the server origin a separate point that is one unit away from all other points.

Lemma 1

There is a Hamiltonian Path in G if and only if 2n requests can be served within time \(T=2n+1\) in the URDARP instance.

Proof

Let \(p = (v_1, v_2, \ldots , v_n)\) be a Hamiltonian Path in G. Construct the sequence of 2n URDARP requests in \(G'\) by the node request from \(v_1'\) to \(v_1''\), the edge request from \(v_1''\) to \(v_2'\), the node request from \(v_2'\) to \(v_2''\), the edge request from \(v_2''\) to \(v_3'\), and so forth, through the edge request from \(v_{n-1}''\) to \(v_n'\), the node request from \(v_n'\) to \(v_n''\), and finally the edge request from \(v_n''\) to the designated sink t. This sequence can be executed in time \(T=2n+1\) since it requires one unit of time for the server to drive from the origin to \(v_1'\) and 2n units for the remaining drives.

Conversely, consider a URDARP sequence in \(G'\) of length 2n. Note that by construction of \(G'\), any sequence of URDARP requests must alternate between node requests and edge requests, where any edge to the sink is counted as an edge request (and must be a terminal request). Since destinations in \(G'\) can be partitioned into the sink, single-primed points, and double-primed points, we can thus analyze the three possibilities for the destination of the final URDARP request.

If either the sink or a single-primed point is the destination for the final URDARP request, the URDARP sequence must end with an edge request. The alternating structure ensures the URDARP sequence begins with a node request, and thus contains exactly n node requests and n edge requests. If a double-primed point is the destination for the final URDARP request, the URDARP sequence must end with a node request. The alternating structure ensures the URDARP sequence begins with an edge request, and contains exactly n edge requests and exactly n node requests. Thus, the URDARP sequence always contains n node requests. This ensures that the length n path in the original graph G includes all n vertices in the original graph G, and thus the existence of a Hamiltonian Path.\(\square \)

The above reduction procedure, together with Lemma 1, gives the following theorem.

Theorem 2

The problem URDARP is NP-hard.

3 Algorithms

We begin by presenting our twochain algorithm that is a 2/3-approximation for URDARP (please see Algorithm 1 for details). The idea of this polynomial-time algorithm is that it simply looks for chains of requests of length at least 2 whenever a drive is required. At each time unit if there is a request that starts at the current location of the server, the server will always serve that request (continuing the chain) rather than driving away to a different requestFootnote 2. In addition, the server is never “idle” in that if there are remaining requests to serve that can be served before the deadline, the server will drive to serve one of them.

Note that Algorithm 2 (which we describe in detail in Section 3.2) is a generalization of the twochain algorithm, and equivalent to twochain when setting \(k=2\). We include Algorithm 1 here for clarity and completeness.

It is reasonable to hope that an algorithm like \(\textsc {twochain}\) would achieve a 2/3-approximation, since even if the optimal solution serves a request every time unit, for a total of T requests, in that case, \(\textsc {twochain}\) could optimistically serve at least a pair of requests for every 3 time units. We also see from the simple instance in Lemma 6 that it can in fact earn no better than a 2/3-guarantee. The difficulty arises in the possibility that \(\textsc {twochain}\) may choose its chains in a way that causes too many chains to be “chopped up” and left in bits of too many singleton requests to stay within the fraction of 2/3 of the optimal number. But intuitively such a pathological situation cannot happen if the optimal solution served a request in every time unit, and also cannot happen if the time limit is large enough compared to the number of requests. Hence the analysis will require a subtle interplay between the time limit and the optimum number of requests served. Through an admittedly painstaking case analysis, we are able to show that the 2/3-approximation is in fact achieved.

While the analysis of \(\textsc {twochain}\) we provide requires many detailed cases, we were surprised to discover that simpler more elegant approaches (e.g., exchange argument, induction on time t, potential function) all fell short in subtle ways, indicating to us the problem is more nuanced than what one expects at first blush. We believe that the interplay between requests and the possibility for numerous criss-crosses of chains of requests prevents simpler analyses. We note that our proof actually yields a guarantee of not only 2/3 of the optimal number of requests, but in fact the stronger result \(1/3(\vert \textit{OPT }\vert +T-1)\), where T is the time limit.

3.1 The TWOCHAIN Algorithm

Let S, T, and o denote the set of requests, time limit, and origin, respectively. Let \(\textit{OPT}(S,T, o)\) and ALG(STo) denote the schedules returned by the optimal algorithm OPT and \(\textsc {twochain}\), respectively, on the instance (STo) and thus let \(\vert \textit{OPT}(S, T, o)\vert \) and \(\vert ALG(S, T, o)\vert \) denote the number of requests served by OPT and \(\textsc {twochain}\), respectively.

Algorithm 1
figure a

The \(\textsc {twochain}\) algorithm.

We begin by showing that in the special case where the deadline is more than twice the number of requests, \(\textsc {twochain}\) is optimal.

Lemma 3

If \(T\ge 2\vert S\vert \) then \(\vert \textit{OPT}(S,T, o)\vert = \vert ALG(S, T, o)\vert = \vert S\vert \).

Proof

By induction on \(\vert S\vert \). If \(\vert S\vert = 1\), then clearly \(\textsc {twochain}\) can serve the request if \(T \ge 2\). If \(\vert S\vert \ge 2\), then within the first two time units \(\textsc {twochain}\) serves at least one request. So there are at most \(\vert S\vert -1\) remaining requests to serve within \(T-2\) time. Since \(T \ge 2\vert S\vert \), then by the inductive hypothesis, \(T-2 \ge 2(\vert S\vert -1)\), so \(\textsc {twochain}\) can serve the remaining requests within the remaining time.\(\square \)

We note, in fact, that the result claimed in Lemma 3 is true for any reasonable algorithm that does not perform two consecutive empty drives while there is still an available request. This categorization of reasonable algorithms includes the longest-chain-first algorithm which we will describe in Section 3.3.

We now must face the possibility that \(\textsc {twochain}\) may choose its chains in a way that causes too many chains to be broken up and left in pieces of single requests for the algorithm to stay within the fraction of 2/3 of the optimal. Intuitively, as Lemma 3 confirms, such a situation cannot happen if the time limit is large enough compared to the number of requests. Hence the bulk of the analysis will take place in the next lemma, where we tackle the general case when the deadline T is tighter. This proof requires management of a subtle interplay between the time limit and the optimum number of requests served. We ultimately manage to prove a lower bound on what \(\textsc {twochain}\) earns, that will suffice for later showing \(\textsc {twochain}\) yields a 2/3-approximation.

Lemma 4

Let \(m=\vert \textit{OPT}(S,T, o)\vert \). If \(T < 2\vert S\vert \), then \(\vert ALG(S, T, o)\vert \ge \tfrac{1}{3}(m + T - 1).\)

Proof

Since \(T < 2\vert S\vert \), \(S\ne \emptyset \). Let k denote the number of requests in the first chain served by \(\textsc {twochain}\) and denote this chain as \((u_0, u_1), (u_1, u_2), \ldots , (u_{k-1}, u_k)\). Let c denote the number of drives \(\textsc {twochain}\) makes to get to the first request, that is, either \(c=0\) if there is a request starting at o and \(c=1\) if not. After \(\textsc {twochain}\) serves the first chain, we are left with a smaller instance of the problem \((S_{new}, T_{new}, o_{new})\) where \(S_{new} = S - \{(u_0, u_1), (u_1, u_2), \ldots , (u_{k-1}, u_k)\}\), \(T_{new} = T-c-k\), and \(o_{new} = u_k\).

We proceed by strong induction on T. If \(T=0, 1,\) or 2, then the lemma is trivially true. If \(T \ge 3\), then since \(\vert S| > T/ 2\), \(\textsc {twochain}\) serves at least one chain. We assume inductively that

$$| ALG(S_{new}, T_{new}, o_{new})\vert \ge \tfrac{1}{3}(| \textit{OPT}(S_{new}, T_{new}, o_{new})| + T_{new} - 1)$$

and will show

$$\vert ALG(S, T, o)\vert \ge \tfrac{1}{3}(\vert \textit{OPT}(S, T, o)\vert + T - 1).$$
  1. Case 1:

    \(k=1\), so \(\textsc {twochain}\) serves only a single request in its first chain.

    1. Case 1.1:

      \(c=1\). Then there is no ride starting at o and the first chain has length 1, so we know that there must be no 2-chains in S. Then all solutions require an empty drive after each service drive, so \(\vert ALG(S, T, o)\vert = m = \lfloor T/2 \rfloor \ge \tfrac{T}{2} - \tfrac{1}{2} \) and hence, \(m \ge \tfrac{1}{3} (m +T-1)\).

    2. Case 1.2:

      \(c=0\). Then there is a ride starting at o but there is no 2-chain that starts at o. Let \(\textit{OPT}(S,T,o)\) return the path \((o,v_1),(v_1,v_2), \ldots ,\) \((v_{T-1},v_T)\). Therefore \((o,v_1)\) and \((v_1,v_2)\) cannot both be rides. Then the path \((v_2, v_3), \ldots ,(v_{T-1},v_T)\) has at least \(m-1\) rides from S and therefore at least \(m-2\) rides from \(S_{new}=S-\{(o,u_1)\}\). So the path \((o_{new},v_2),(v_2, v_3),\ldots ,(v_{T-1},v_T)\) also has at least \(m-2\) rides from \(S_{new}\). Thus \(\vert \textit{OPT}(S_{new}, T-1, u_1=o_{new})\vert \ge m-2\). By induction, \(\vert ALG(S,T,o)\vert \ge 1 +\frac{1}{3} (\vert \textit{OPT}(S_{new},T-1,u_1)\vert +(T-1)-1) \ge 1 + \frac{1}{3} (m-2 +T-1-1) = \frac{1}{3}(m+T-1)\).

  2. Case 2:

    \(k \ge 2\), so \(\textsc {twochain}\) serves at least two requests in its first chain. There are two subcases.

    1. Case 2.1:

      \(T_{new} \!\ge \! 2\vert S_{new}\vert \). In this case, by Lemma 3 we have \(| ALG(S_{new}, T_{new}, o_{new})| \) \( = | S_{new}| = | S| - k\). So we have:

      $$\begin{aligned} \vert ALG(S,T,o)\vert = k + \vert ALG(S_{new}, T_{new}, o_{new})\vert = k + \vert S\vert -k = \vert S\vert \end{aligned}$$

      Hence, \(\vert \textit{OPT}(S, T, o)\vert =\vert S\vert \) as well, so recalling that \(T<2\vert S\vert \), we have, as desired:

      $$\begin{aligned} \vert ALG(S,T,o)\vert = \vert S\vert = \tfrac{1}{3} (\vert S\vert + 2\vert S\vert ) > \tfrac{1}{3} (\vert \textit{OPT}(S, T, o)\vert + T-1). \end{aligned}$$
    2. Case 2.2:

      \(T_{new} < 2\vert S_{new}\vert \). The above cases are special cases, but now we have arrived at the general case that is more difficult to handle. The strategy in what follows will be to identify a path \(P^*\) that is a sub-path of the original \(\textsc {opt}\) path. We will use parts of \(P^*\) to carefully form a candidate path P that becomes a possible path (and hence yields a lower bound) for \(\textsc {opt}\) for the smaller sub-instance. Let the path \(P^*\) of length \(T+1-c\) be the path that traverses \(\textit{OPT}(S, T,o)\) starting from \(o_{new}\). More formally, if \(c=0\), \(P^*\) is \((o_{new}, o), (o, v_1), (v_1, v_2), \ldots (v_{T-1}, v_T)\). If \(c=1\), then since \((o, v_1)\) is not in S, \(P^*\) is \((o_{new}, v_1), (v_1, v_2), \ldots (v_{T-1}, v_T)\). Let r denote the number of requests in \((u_0, u_1), (u_1, u_2), \ldots , (u_{k-1}, u_k)\) that are also in \(\textit{OPT}(S, T, o)\) and note that \(r \le k\). So \(P^*\) has m requests from S and \({m-r}\) requests from \(S_{new}\). Note that \(T_{new} = T - c-k = (T+1-c)-(k+1) = \vert P^*\vert - (k+1)\). We modify \(P^*\) to create a path P by deleting the last \(k+1\) drives from \(P^*\). Then \(\vert P\vert = T_{new}\) and P has at most \(k+1\) fewer requests from \(S_{new}\) than \(P^*\) so P has at least \(m-r-(k+1)\) requests from \(S_{new}\). Hence, we have:

      $$\begin{aligned} \vert \textit{OPT}(S_{new}, T_{new}, o_{new})\vert \ge m-r-k-1 \end{aligned}$$
      (1)

      There are two subcases.

      1. Case 2.2.1:

        If \(-r+k-1 \ge c\), so there are a number of requests that differ between \(\textsc {opt}\) and the first path of \(\textsc {twochain}\), we have:

        $$\begin{aligned} \vert ALG(S,T,o)\vert&= k + \vert ALG(S_{new},T_{new},o_{new})\vert \\&\ge k + \tfrac{1}{3} (\vert \textit{OPT}(S_{new}, T_{new}, o_{new})\vert + T_{new} -1) \ \\&\text {by inductive hypothesis} \\&\ge k + \tfrac{1}{3} (m-r-k-1 + T_{new} - 1)~{\text {by}~(1)} \\&\ge \tfrac{1}{3}(m + T -1 -r +k-1-c) \\&\ge \tfrac{1}{3}(m + T -1) \end{aligned}$$

        which is the desired equation.

      2. Case 2.2.2:

        If \(-r+k-1 < c\), then \(k-r \le c\), so there is at most one request that differs between \(\textsc {opt}\) and the first path of \(\textsc {twochain}\), and there are two subcases.

        1. Case 2.2.2.1:

          \(k=r\), so all the requests of the initial \(\textsc {twochain}\) path are also served by \(\textsc {opt}\). Please see the Appendix where we show that in all subcases of Case 2.2.2.1, P starts at \(o_{new}\), has at least \(m-r-k+1\) requests from \(S_{new}\), and has length \(T_{new}\). Thus:

          $$\begin{aligned} | \textit{OPT}(S_{new}, T_{new}, o_{new})| \ge m-r-k+1 \end{aligned}$$
          (2)

          Then, since \(c=0\) or \(c=1\), we have:

          $$\begin{aligned} | ALG(S,T,o)|\ge & k + \tfrac{1}{3} (| \textit{OPT}(S_{new}, T_{new}, o_{new})| + T_{new} -1)\nonumber \\ & \text {by inductive hypothesis}\nonumber \\\ge & k + \tfrac{1}{3} (m-r-k+1 + T_{new} - 1)~ \text {by~(2)}\nonumber \\\ge & k + \tfrac{1}{3} (m-2k+1 + T-k-c - 1)~ \text {since}~k=r\nonumber \\\ge & \tfrac{1}{3} (m + T-c) \ge \tfrac{1}{3} (m + T-1)~\text {since}~c=0~\text {or}~1 \end{aligned}$$

          So we are done with Case 2.2.2.1 and must now prove Case 2.2.2.2 to complete the proof.

        2. Case 2.2.2.2:

          \(k \ne r\). Recall that since \(k-r \le c\), it must be that \(k=r+1\). Please see the Appendix where we show that in all subcases of Case 2.2.2.2, P starts at \(o_{new}\), has at least \(m-r-k\) requests from \(S_{new}\), and has length \(T_{new}\). Thus:

          $$\begin{aligned} | \textit{OPT}(S_{new}, T_{new}, o_{new})| \ge m-r-k \end{aligned}$$
          (3)

          So we have:

          $$\begin{aligned} | ALG(S,T,o)|&\ge k + \tfrac{1}{3} (| \textit{OPT}(S_{new}, T_{new}, o_{new})| + T_{new} -1) \ \\&\text {by inductive hypothesis} \\&\ge k + \tfrac{1}{3}(m-r-k + T-k-c - 1)~ {\text {by}~(3)}\\&= \tfrac{1}{3}(m + T -1 -r +k -c)\\&= \tfrac{1}{3}(m + T-1 -r +(r +1) -c)\\&\ge \tfrac{1}{3}(m + T - 1 ) \end{aligned}$$

This completes the proof.\(\square \)

Theorem 5

\(\textsc {twochain}\) gives a 2/3 approximation for URDARP.

Proof

We again proceed by considering two cases.

  • Case 1: \(T \ge 2\vert S\vert \): Then by Lemma 3, \(\vert ALG(S,T,o)\vert = \vert \textit{OPT}(S, T,o)\vert \), and we are done.

  • Case 2: \(T<2\vert S\vert \): Then by Lemma 4, \(\vert ALG(S,T,o)\vert \ge \tfrac{1}{3}(\vert \textit{OPT}(S,T,o)\vert + T - 1)\). As in Lemma 4, let \(m = \vert \textit{OPT}(S,T,o)\vert \). There are two subcases.

    • Case 2.1: If \(m<T\), then \(\vert ALG(S,T,o)\vert \ge \tfrac{1}{3}(m + T - 1) > \tfrac{1}{3} (m + m -1)\). Since \(\vert ALG(S,T,o)\vert \) is an integer, this implies \(\vert ALG(S,T,o)\vert \ge 2m/3\).

    • Case 2.2: If \(m=T\), then an \(\textit{OPT}(S,T,o)\) solution must be

      $$(o=v_0,v_1),(v_1,v_2), \ldots , (v_{m-1},v_m)$$

      where every drive must be a service drive, serving a request from S. We use the same definitions of kr,  and c as in Lemma 4 and note that \(c=0\) in this case. Denote the first chain served by \(\textsc {twochain}\) as \((o=u_0,u_1),(u_1,u_2),\ldots ,(u_{k-1},u_k)\). Note that \(\textsc {twochain}\) would start with a service drive right from o because in this case there is a 2-chain starting at o. If \(k=T=m\) then \(\vert ALG(S,T,o)\vert = \vert \textit{OPT}(S, T, s)\vert \) so we are done. If \(m=1\) or \(m=2\) then, \(k=m\), so we are done. If \(m=3\) then \(k=2\) or 3, and in both cases we have \(k>2m/3\), so we are also done. So we consider the case where \(m \ge 4\) (so \(k \ge 2\)) and \(k <m\). After \(\textsc {twochain}\) serves the first chain, the server is at \(u_k\) and there is \(T-k\) time remaining, so in the smaller instance of the problem, \(T_{new} = T-k\), and \(o_{new} = u_k\). Since \(\vert \textit{OPT}(S,T,o)\vert =m\), then \(\vert \textit{OPT}(S_{new},T+1,u_k)\vert \ge m-r\), since in time \(T+1\), \(\textsc {opt}\) can drive from \(u_k\) to the origin, and then follow the path of \(\textit{OPT}(S,T,o)\) to serve \(m-r\) requests (recall that r is the number of requests in \(\textit{OPT}(S,T,o)\) that are also in the first chain of ALG(STo)). So recalling that \(T_{new} = T-k\), we have,

      $$\begin{aligned} \vert \textit{OPT}(S_{new}, T_{new}, u_k)\vert \ge m-r-k-1 \end{aligned}$$
      (4)

      And thus:

      $$\begin{aligned} \vert ALG(S,T,o)\vert&= \vert ALG(S_{new},T_{new},u_k)\vert + k \\&\ge \tfrac{1}{3}(\vert \textit{OPT}(S_{new},T_{new},u_k)\vert + T_{new}-1) + k~\text {by Lemma}~4 \\&\ge \tfrac{1}{3}(m - r - k - 1 + T - k - 1) + k~\text {by~(4)} \\&= \tfrac{1}{3}(2m) + \tfrac{1}{3}(-r+k-2) \ge 2m/3~\text {unless}~k=r~\text {or}~k=r+1 \end{aligned}$$

      For the cases of \(k=r\) and \(k=r+1\), we follow the same steps we did for these cases in the proof of Lemma 4 to modify the \(\textsc {opt}\) path.

      • \(k=r\): Then by Case 2.2.2.1 of the proof of Lemma 4, we have

        $$\vert \textit{OPT}(S_{new}, T_{new}, o_{new} )\vert \ge m-r-k+1.$$

        So:

        $$\begin{aligned} \vert ALG(S,T,o)\vert&= \vert ALG(S_{new},T_{new},u_k)\vert + k \\&\ge \tfrac{1}{3}(\vert \textit{OPT}(S_{new},T_{new},u_k)\vert + T_{new}-1) + k~\text {by Lemma}~4 \\&\ge \tfrac{1}{3}(m-r-k+1 +T-k-1) + k\\&\ge \tfrac{1}{3}(2m -3k) + k~\text {since}~T=m~\text {and}~r=k \\&\ge 2m/3 \end{aligned}$$
      • \(k=r+1\): Then by Case 2.2.2.2 of the proof of Lemma 4, we have

        $$\vert \textit{OPT}(S_{new}, T_{new}, o_{new}) \vert \ge m-r-k.$$

        So:

        $$\begin{aligned} \vert ALG(S,T,o)\vert\ge & \tfrac{1}{3}(m-r-k +T-k-1) +k \\\ge & \tfrac{1}{3}(2m -3k) +k~\text {since}~T=m~\text {and}~r=k-1\\\ge & 2m/3 \end{aligned}$$

We have shown that for all cases, \(\vert ALG(S,T,o)\vert \ge 2m/3\), so the proof is complete.\(\square \)

We now show that the approximation ratio of 2/3 for \(\textsc {twochain}\) is tight.

Theorem 6

The approximation ratio of \(\textsc {twochain}\) for URDARP is no greater than 2/3.

Proof

Consider an instance with three requests in a single chain with no requests starting at the origin o. Let \(T=4\). \(\textsc {twochain}\) may select the second and third requests of the chain as its first two requests. For \(\textsc {twochain}\) to drive to and then serve the two requests takes three time units. It then drives and runs out of time. On the other hand, \(\textsc {opt}\) starts at the first request of the chain and completes all three requests by time \(T=4\).\(\square \)

3.2 The k-chain Algorithm

We now show that a natural generalization of \(\textsc {twochain}\), which we refer to as k-chain (see Algorithm 2) yields at most a 7/9-approximation. This polynomial-time algorithm (which is exponential in the fixed k that is selected) proceeds analogously to \(\textsc {twochain}\); rather than prioritizing requests that are the first in a 2-chain, it instead prioritizes requests that are the first in a k-chain. One might expect that this algorithm yields a \(k/(k+1)\)-approximation but we show that, surprisingly, there exists an instance of URDARP where k-chain serves no more than 7/9 of the number of requests served by the optimal solution. The precise instance is given below in Figure 2, where the main idea is that the algorithm may at first favor a long sequence of requests, leaving many now-isolated smaller sequences to serve later, resulting in fewer requests served overall.

Algorithm 2
figure b

The k-chain algorithm.

Theorem 7

The k-chain algorithm yields at most a 7/9-approximation.

Proof

In the input instance (see Fig. 2) there is a chain of \(c+k\) requests, for positive integers c and k, and the origin, o, is at the start of this chain. Denote these \(c+k\) requests as \((v_0,v_1),(v_1,v_2), (v_2,v_3), \ldots , (v_{c-1},v_c), \ldots , \) \((v_{c+k-1},v_{c+k})\), so \(o=v_0\). In addition, for each point \(v_i\), for \(i=1,2,\ldots , c\), there is another pair of requests: one that leaves from \(v_i\) to a point not on the chain, call it \(v_i'\), and another that leaves from \(v_i'\) and returns to \(v_i\), forming a total of c loops each of length 2.

Let \(T=3c\). Then \(\textsc {opt}(S,T,o) = \textsc {opt}(S,3c,v_0) = 3c\) since \(\textsc {opt}\) can serve all the loops “on the way” as it proceeds across from \(v_1\) to \(v_{c+k}\). I.e., the optimal schedule is

$$(v_0,v_1), (v_1, v_1'), (v_1',v_1),(v_1,v_2), (v_2,v_2'),(v_2',v_2),(v_2,v_3), \ldots , (v_{c+k-1}, v_{c+k}).$$

On the other hand, Algorithm 2, which prioritizes chains of length k, may choose one request at a time from the “spine” of this input instance, and end up serving all the requests along the straight path first, rather than serving the loops along the way. In this event, at time \(c+k\) it must then go back and serve as many loops (chains of length 2) as it can in the remaining \(3c-(c+k)=2c-k\) units of time, expending one unit of time on an empty drive to the next loop after serving each loop. Hence:

$$\begin{aligned} \vert \textsc {alg}(S,T,o)\vert = c+ k + \left\lfloor \tfrac{2}{3}(2c-k)\right\rfloor \end{aligned}$$

And note that

$$\begin{aligned} \lim _{c\rightarrow \infty }\frac{\vert \textsc {alg}(S,T,o)\vert }{\vert \textsc {opt}(S,T,o)\vert } = \lim _{c\rightarrow \infty }\frac{c+ k + \left\lfloor \tfrac{2}{3}(2c-k)\right\rfloor }{3c} = \frac{7}{9}. \end{aligned}$$
Fig. 2
figure 2

An instance showing that the k-chain algorithm has approximation ratio at most 7/9. Note that the graph is complete but for simplicity, we show only relevant edges

3.3 The Longest-Chain-First Algorithm

We now provide a discussion of the greedy algorithm that serves a longest chain of (distinct) requests first, removes these requests from the instance, then serves a longest chain among the remaining requests and removes these, and continues this way until time runs out. In essence, this algorithm is k-chain as k tends to infinity. That is, for any particular instance, this algorithm behaves the same as k-chain for k sufficiently large. We refer to this algorithm as the longest-chain-first (lcf) algorithm. Please see Algorithm 3 for the formal definition; similar to k-chain, when there is a request at the origin, the algorithm will not initially move to the longest chain, but will instead start with the longest chain at the origin first.

Algorithm 3
figure c

The Longest-Chain-First algorithm.

Implementation of Line 9 of Algorithm 3 requires a solution to the longest trail problem, where a trail is defined as a path with no repeated edges, i.e., a chain of DARP requests. Although the longest trail problem is NP-hard [35, 36], a standard polynomial-time algorithm that uses a topological sort on the vertices of the acyclic graph as a pre-processing step can be employed for finding the longest trail in the special case of acyclic graphs. This is because when the graph is acyclic, the longest trail of the graph is also the longest path of the graph, and the longest path in a directed acyclic graph can be solved using a dynamic programming algorithm in linear time [37]. We use the term request-graph to refer to the directed multigraph where each request is represented by an edge in the graph and each vertex in the graph is the source or destination of a request. So if we consider the space of inputs where the request-graphs are acyclic, we can employ the polynomial-time algorithm for finding the longest trail in an acyclic graph to implement the greedy lcf algorithm.

Fig. 3
figure 3

An instance showing that the lcf algorithm has an approximation ratio of at most 5/6. Note that the graph is complete but for simplicity, we show only relevant edges

It turns out that even when restricting to acyclic graphs, uniform revenues and a uniform metric space, the lcf algorithm yields an approximation ratio of at most 5/6.

Theorem 8

The approximation ratio of the lcf algorithm for URDARP is at most 5/6.

Proof

Please refer to Fig. 3. The instance depicts a request-graph for which \(T=8\) and the origin is one unit away from the source of all requests. An optimal solution is to serve the top 3-chain followed by the bottom 3-chain for 6 rides served. The lcf algorithm may instead start with \((v_1,v_2)\), but then take \((v_2,v_7)\), finishing with \((v_7,v_8)\). lcf would then require an empty drive to a remaining 2-chain, but after serving the 2-chain, there would be no time left to drive to and serve any more requests, so lcf serves only 5 rides.\(\square \)

Note that \(\textsc {lcf}\) will never serve a chain that is a proper subchain of another available chain in the request-graph, unless it is the final chain that \(\textsc {lcf}\) serves and it runs out of time, or it is the first chain and the origin happens to be in the middle of a longer chain, as doing so contradicts the algorithm’s definition since \(\textsc {lcf}\) must serve longer chains before any shorter chains (e.g. a proper subchain). If \(\textsc {lcf}\) does exhaust the available time, we may assume without loss of generality that the first request of the final chain that \(\textsc {lcf}\) serves had no incoming requests.

We now provide results showing structural properties of the graph or the time limit under which \(\vert \textsc {lcf}\vert = \vert \textsc {opt}\vert \).

Lemma 9

If no two requests share a common destination, then \(\textsc {lcf}\) performs optimally.

Proof

Let A be an optimal schedule which is identical with \(\textsc {lcf}\) for the longest amount of time of all optimal schedules. Let \(t_A\) denote the earliest time at which \(\textsc {lcf}\) and A diverge; i.e., the earliest time where \(\textsc {lcf}\) begins serving a request that A does not also serve at the same time. So either (1) \(\textsc {lcf}\) serves a request during the interval \([t_A,t_A+1]\) that differs from what A serves in that interval or (2) \(\textsc {lcf}\) serves a request during \([t_A,t_A+1]\) while A moves. (Note that by the definition of \(\textsc {lcf}\), the case where \(\textsc {lcf}\) moves while A serves means that the two schedules must have diverged at a time earlier than \(t_A\), which contradicts the definition of \(t_A\).) Then no other optimal schedule B has \(t_B\) (analogously defined) such that \(t_{B}>t_A\).

Let \(A_x\) (respectively \(\textsc {lcf}_x\)) refer to the chain that A (respectively \(\textsc {lcf}\)) most recently started serving at or before time \(t_A\).

Case 1: \(t_A\) occurs in the middle of a chain being executed by \(\textsc {lcf}\) (and possibly also by A). To be precise, this is the case that \(\textsc {lcf}\) serves a request right before and right after time \(t_A\). Note that in this case the intersection of \(A_x\) and \(\textsc {lcf}_x\) is nonempty Let \(\textsc {lcf}_x'\) and \(A_x'\) denote the remainder of \(\textsc {lcf}_x\) and \(A_x\), respectively, beyond \(t_A\). Since \(\textsc {lcf}\) serves the longest chain, \(\textsc {lcf}_x'\) must be at least as long as \(A_x'\), which could be empty.

  • Subcase 1. Schedule A never serves any of the requests that are in \(\textsc {lcf}_x'\). Then we can then swap requests in \(A_x'\) with requests from \(\textsc {lcf}_x'\), resulting in a still-optimal schedule which is consistent with \(\textsc {lcf}\) longer than A, contradicting the definition of A.

  • Subcase 2. Schedule A serves the first request of \(\textsc {lcf}_x'\) at some later time \(t^*>t_A\). Since no two requests share a common destination, then A must have an empty move right before time \(t^*\). Thus, we can reorder A by swapping the chain served by A at time \(t^*\) with the chain served by A at \(t_A\) (note both these chains start at the same location), resulting in a still-optimal schedule which is consistent with \(\textsc {lcf}\) longer than A, likewise contradicting the definition of A.

  • Subcase 3. Schedule A does not serve the first request of \(\textsc {lcf}_x'\) anywhere in its schedule, but does serve some part of \(\textsc {lcf}_x'\) at some later time \(t^* > t_A\). Since no two requests share a common destination, then A must have an empty move right before time \(t^*\). Let the source of the request served by A at time \(t^*\) be called s. Let \(\overline{\textsc {lcf}_x'}\) be the prefix of \(\textsc {lcf}_x'\) that ends at s. We can create a modified version of A called \(A^*\) by replacing the portion of A from \(t_A\) to \(t^*\) with \(\overline{\textsc {lcf}_x'}\), then inserting the replaced portion of A into \(A^*\) after the end of the chain that A started serving at time \(t^*\). Note that since swapping in \(\overline{\textsc {lcf}_x'}\) allows \(A^*\) to proceed straight to s with no empty drives, and \(\textsc {lcf}_x'\) is at least as long a chain as \(A'_x\) (as noted above), schedule \(A^*\) is no less optimal than A but agrees with \(\textsc {lcf}\) longer than A, contradicting the definition ofA.

Case 2: \(t_A\) is at the start of new chains being executed by A and \(\textsc {lcf}\), so the intersection of \(A_x\) and \(\textsc {lcf}_x\) is empty. As in Case 1, regardless of whether A later serves all or a portion of \(\textsc {lcf}_x\), a swap could be made to produce a still-optimal schedule which is consistent with \(\textsc {lcf}\) longer than A, likewise contradicting the choice of A. As in Case 1, the subcases rely on both the property that no two requests share a common destination, and that \(\textsc {lcf}\) does in fact serve the longest available chain.\(\square \)

Lemma 10

If no two requests share a common source, then \(\textsc {lcf}\) performs optimally.

Proof

The proof mimics the previous proof, where when looking at commonalities and remainders, we now look at the portion that \(\textsc {lcf}\) or the optimal solution served (or could have served) before the common portion, rather than afterwards.\(\square \)

Lemma 11

Any URDARP instance (STo) with \(T < 7\) and no request at the origin has \(\vert \textsc {lcf}(S,T,o)\vert =\vert \textsc {opt}(S,T,o)\vert \).

Proof

The proof considers all possible behaviors of \(\textsc {opt}\) on instances with \(T < 7\), after first restricting the sets that need to be considered in detail. We assume without loss of generality that T is no later than the time at which \(\textsc {opt}\) finishes serving all available requests. We need not consider instances where \(\textsc {opt}\) travels only a single path, as \(\textsc {lcf}\) behaves optimally on such instances.

We then consider the possible path lengths which \(\textsc {opt}\) serves by T. We can assume without loss of generality that the paths that \(\textsc {opt}\) serves are in non-increasing order of length. Let, for example, \(2-1-1\) denote that \(\textsc {opt}\) serves a path of length 2, travels for one time unit, serves a path of length 1, travels for an additional time unit, and serves another path of length 1. The set of all possible \(\textsc {opt}\)-configurations that have not yet been ruled out consists thus of \(\{4-1, 3-2, 2-1-1, 3-1, 2-2, 1-1-1, 2-1, 1-1\}\).

We now show that \(\textsc {lcf}\) will serve the same number of requests as \(\textsc {opt}\) in all of these \(\textsc {opt}\)-configurations.

First we address the cases where \(\textsc {opt}\) serves a chain of any length before serving only singletons. Since \(\textsc {opt}\) serves the chain as its first action, \(\textsc {lcf}\) must be able to serve the same length chain (or longer, should one exist). Then, regardless of if \(\textsc {lcf}\) served the identical chain to \(\textsc {opt}\), if it served the same length chain as \(\textsc {opt}\), the remaining number of requests must still be available, irrespective of the configuration, and thus \(\textsc {lcf}\) can likewise serve singletons in the remaining time. If \(\textsc {lcf}\) served a chain longer than that served by \(\textsc {opt}\), \(\textsc {lcf}\) can likewise serve as many requests as \(\textsc {opt}\) can overall because \(\textsc {opt}\) served only singletons after the first chain. Hence, \(\textsc {lcf}\) serves the same number of requests as \(\textsc {opt}\) in all such cases, namely \(\{4-1, 2-1-1, 3-1, 1-1-1, 2-1, 1-1\}\).

We now consider the 2-2 case, which must arise from \(T=5\). There cannot be a chain of length 5 or greater since \(\textsc {opt}\) served only 4 requests within the time limit. If there is a chain of length 4, \(\textsc {lcf}\) will serve it, matching the number of rides served by \(\textsc {opt}\). If there is a chain of length 3, \(\textsc {lcf}\) would likewise serve it and then be able to serve a singleton by the argument in the preceding paragraph, again matching the number of rides served by \(\textsc {opt}\). We are now at the case where the original request-graph contains no chains longer than 2, and contains at least two of them. Accordingly, there is no way that even if \(\textsc {lcf}\) chose a different chain than \(\textsc {opt}\) to serve first that another chain of length 2 does not remain for \(\textsc {lcf}\) to serve. Thus, in the 2-2 case \(\textsc {lcf}\) again matching the number of rides served by \(\textsc {opt}\).

We finally consider the 3-2 case, which must arise from \(T=6\). Analogously to the 2-2 case, if the longest chain in the original instance has length 4 or greater, \(\textsc {lcf}\) serves the same number of rides served by \(\textsc {opt}\). We are now at the case where the original request-graph contains no chains longer than 3, but contains at least one such chain. If it contains only one, then \(\textsc {lcf}\) must choose the same first requests to serve as \(\textsc {opt}\) and then since \(\textsc {opt}\) is able to serve two additional consecutive requests, \(\textsc {lcf}\) can as well, matching the number of rides served by \(\textsc {opt}\). Suppose instead that the original request-graph contains two or more chains of length 3 (but still no longer chains). Namely, let one such chain be \(a\rightarrow b\rightarrow c\rightarrow d\). No request can leave d or enter a as that would create a chain longer than 3. Any chains that leave a or enter d neither disrupt or are disrupted by the service of the chain \(a\rightarrow b\rightarrow c\rightarrow d\). There can be a maximum of an additional 2-chain leaving b, with a singleton or nothing else entering b. Similarly, there can be an additional 2-chain entering c, and a singleton or nothing leaving it. All such configurations mean that any choice of chain of length 3 that \(\textsc {lcf}\) might choose that either involves any of abcd or does not involve those vertices would still leave a chain of length 2 available to \(\textsc {lcf}\) so \(\textsc {lcf}\) again matching the number of rides served by \(\textsc {opt}\).\(\square \)

4 Empirical Analysis

To evaluate the performance of our \(\textsc {twochain}\) algorithm (see Section 3.1), we simulated a Dial-a-Ride system on a uniform metric space with 50 locations. For each request, the source and destination were chosen uniformly at random (but the same node could not be chosen as both the source and destination of a request). We generated 25, 50, 75, and 100 requests with time limit \(T=25\) and \(T=50\) and compared \(\textsc {twochain}\) with k-chain for \(k=1, 3\), and 4. To compare \(\textsc {lcf}\) with \(\textsc {twochain}\) experimentally, we implemented a variation of lcf, which we refer to it as pseudo-longest-chain-first (pseudo-lcf) because it is guaranteed to find the longest chain only on acyclic graphs. We implemented this as described in Section 3.3 above for acyclic graphs: first we ran a depth-first topological sort on the request graph, then ran the longest-path dynamic programming algorithm on the resulting directed acyclic graph. For graphs with cycles, there is no way to topologically sort the vertices, so the algorithm sorts the vertices using the same depth-first procedure, and proceeds with the longest path algorithm on the resulting ordered graph, despite it not being a proper topological sorting. Thus, for graphs with cycles, pseudo-lcf may not always find the longest chain; instead, it finds some acyclic path that is a sub-path of a potentially longer chain.

Fig. 4
figure 4

Algorithms tested on randomly generated instances with 50 nodes and \(T=25\) (left) and \(T=50\) (right). Points plotted show the number of requests served by each algorithm, given a total number of requests released of 25, 50, 75, and 100

Figures 4-7 show some experimental results. The left side of Figure 4 shows the number of requests served by k-chain, for \(k=1\) to \(k=4\), and pseudo-\(\textsc {lcf}\), with 25, 50, 75, and 100 total requests for \(T=25\), while the right side of Fig. 4 shows these values for \(T=50\). The graphs report the averages over 50 trials for each algorithm. They indicate that although using \(k=3\) or \(k=4\) yields a higher number of requests served than \(\textsc {twochain}\) (i.e. \(k=2\)), for both \(T=25\) and \(T=50\) (Fig. 4), the increase is relatively small and diminishes as k increases. In contrast, the increase in requests served from \(k=1\) to \(\textsc {twochain}\) is more significant. The runtime of k-\(\textsc {chain}\) grows super-polynomially in k, with apparently only marginal improvement in number of rides served as k increases. As such, since \(\textsc {twochain}\) is a much simpler and faster algorithm than pseudo-\(\textsc {lcf}\), these results suggest that \(\textsc {twochain}\) may be the preferred algorithm to use in practice, depending on the application setting.

True \(\textsc {lcf}\) and \(\textsc {opt}\) were not a part of the above experiments because the longest trail problem and URDARP are both NP-hard. For instance, the exponential runtime of our brute-force implementation of \(\textsc {opt}\) for URDARP made executing the algorithm beyond an input size of 15 requests prohibitively time-consuming. Thus, to provide more context for the performance of k-\(\textsc {chain}\), we now present a second set of experiments that illustrate the performance of k-\(\textsc {chain}\) when the parameter k is increased up to \(k=20\). These experiments include more points in the metric space, a higher density of requests, and a longer time limit.

Fig. 5
figure 5

Average number of requests served by k-\(\textsc {chain}\) for \(k = 1,2, \ldots 20\)

Fig. 6
figure 6

Average number of requests served by k-\(\textsc {chain}\) for 1000, 1100, 1200, 1300, and 1500 total requests released

Figure 5 shows simulations of k-\(\textsc {chain}\) with k from 1 to 20 on randomly generated inputs of \(1000,1100,\ldots , 1500\) requests, with 100 locations/points in a uniform metric, and a time limit T of 800. For each total number of requests, we generate 200 random inputs and averaged the number of requests served. From the figure, it is again clear that as k increases, the performance of k-\(\textsc {chain}\) improves. However, as the total number of requests approaches 1500, we see that the marginal improvement as k increases is reduced.

Figure 6 provides another depiction of these results emphasizing that, as the number of requests increases, causing the space to be more densely filled with requests and creating more and longer chains, the improvement of k-\(\textsc {chain}\) as k increases from 1 to 20 becomes less dramatic. Each curve in this figure is a fixed total number of requests, and shows how increasing k affects the number of requests served. This plot indicates that, when running the k-\(\textsc {chain}\) algorithm, practitioners may wish to select the k-value they use based on how densely the metric space is expected to be filled with requests. If the number of requests is high relative to the number of points in the space, then choosing a higher value of k does not yield the same benefit as when the space is more sparsely populated with requests.

Finally, we consider the runtime of k-\(\textsc {chain}\) as a function of its performance with respect to number of requests served. Since the runtime of k-\(\textsc {chain}\) is \(\Omega (n^k)\), increasing exponentially with k, we wish to determine the point at which the performance gain is no longer enough to offset the runtime. To this end, we measure the average execution time of k-\(\textsc {chain}\) when the number of requests is 1000, and compare it with the number of requests served. (Please see Fig. 7.) As expected, 1-chain is the fastest, but also yields the fewest requests served, while on the other hand, 20-chain serves the most requests, but also takes the longest to run. This test suggests that an ideal k for practitioners to consider in a setting with request density and time horizon similar to this one could be between 10 to 12. The simulation codebase is publicly available [38] for modeling and testing to find an informed choice of k in real-world settings with other request density and time limit parameters.

Fig. 7
figure 7

Runtime versus performance of k-\(\textsc {chain}\) for \(k=1,2, \ldots , 20\)

5 Concluding Remarks

This work provides a theoretical foundation for analyzing the more general forms of the problem. Though the assumptions of uniform revenue and uniform metric should ultimately be relaxed, we found that when removing either of these uniformity restrictions, basic greedy algorithms yield ratios no better than 1/2. We now provide two lemmas that show these results, in contrast with the ratio of 2/3 that twochain achieves for URDARP. Additionally, we show that the online version of the problem has a general lower bound of exactly 2/3, which matches the bound we provided for our proposed algorithm.

Lemma 12

The approximation ratio of the greedy algorithm that repeatedly chooses a maximum-revenue request to serve for DARP with non-uniform revenues on a uniform metric is 1/2.

Proof

Consider an instance with x requests chained together each with revenue r, and x individual requests, none of which are connected to other requests, each with revenue \(r + \epsilon \) for some small \(\epsilon >0\). No requests start at the origin o. Let \(T = x+1\). \(\textsc {opt}\) will serve all of the x requests that are chained together, earning xr revenue. An algorithm that greedily chooses one request at a time to serve will serve only the requests with revenue \(r+\epsilon \), and can serve only \(\lceil x/2\rceil \) of them in time T, earning \(\lceil x/2\rceil (r + \epsilon )\). Unsurprisingly, the competitive ratio of 1/2 for the online setting from [39] carries over to this setting, completing the result.\(\square \)

We now show that if we instead remove the uniform metric assumption, the approximation ratio of a similar greedy algorithm is at most 1/2.

Lemma 13

The approximation ratio of the algorithm that greedily chooses the shortest request to serve for DARP with uniform revenues on a non-uniform metric is no greater than 1/2.

Proof

Let a, b, and c denote three points on a non-uniform metric space such that the distance between a and b and b and c is T/k for some positive even integer k such that \(T\mod k=0\), and the distance between a and c is \(T/k - \epsilon \), for some small \(\epsilon >0\). Let a be the origin. Consider an instance on this space with k/2 requests from a to b, k/2 requests from b to a, and k/2 requests from a to c. \(\textsc {opt}\) will alternately serve the k/2 requests from a to b and the k/2 requests from b to a, i.e. as a chain of k requests. An algorithm that greedily chooses the shortest request at a time to serve will serve only the requests from a to c while spending \(T/k - \epsilon \) time on an empty drive from c to a between each serve, thereby serving k/2 requests in total.\(\square \)

In the next lemma, we consider the online setting where requests are released over time and the earliest a request can be served is at its release time. For this setting, we give a general bound, and show that no online algorithm can have competitive ratio greater than 2/3.

Lemma 14

The competitive ratio of any deterministic online algorithm \(\textsc {on}\) for online-URDARP is no greater than 2/3.

Proof

Consider the following instance. At time 0, the adversary releases a set of requests S such that no node appears more than once as either a source or destination ensuring that each request must be served as a singleton, where \(\vert S\vert = T\), for some large T that is divisible by 4. Let \(R = r_1, r_2, \ldots , r_{T/4}\) denote the subset of requests from S that \(\textsc {on}\) chooses to serve until time T/2. At time T/2, the adversary releases T/4 requests \(r'_1, r'_2, \ldots , r'_{T/4}\) such that the sequence of requests \(\sigma = r_1, r'_1, r_2, r'_2, \ldots , r_{T/4}, r'_{T/4}\) makes a chain.

\(\textsc {opt}\) will serve the subset \(S-R\) until time T/2. At time T/2, \(\textsc {opt}\) serves the chain \(\sigma \) while \(\textsc {on}\) can serve only at every other time unit (either one of the new requests or a request from \(S-R\)). \(\textsc {opt}\) serves a total of \(T/4 + T/2\) requests while \(\textsc {on}\) serves a total of T/2 requests, so \(\frac{\textsc {on}}{\textsc {opt}} = \frac{2}{3}\).\(\square \)

This work has provided a tight analysis of the \(\textsc {twochain}\) algorithm, which gave a 2/3-approximation. We find it surprising that the theorem seemed to demand such technical case analysis, rather than surrendering to a more elegant proof by induction or exchange argument. This complication, along with the fact that we show k-chain does not admit a \(k/(k+1)\)-approximation as one might expect, are both curiosities pointing to some underlying subtleties of the problem that likely imply that, even in the uniform metric and uniform revenue case, there is more to this problem than meets the eye.

Since k-chain did not yield the \(k/(k+1)\)-approximation as expected, it remains an open problem to find an upper bound for this algorithm. Likewise, because of lcf’s simplicity as a natural greedy algorithm, we hope to resolve the question as to whether \(\textsc {lcf}\) is a 5/6 approximation.