1 Introduction

The Matroid Secretary Problem (MSP), introduced by Babaioff et al. [1], is a natural and well-known generalization of the classical Secretary Problem [2], motivated by strong connections and applications in mechanism design. Formally, MSP is an online selection problem where we are given a matroid \(\mathcal {M}= (N, \mathcal {I})\),Footnote 1 with elements of unknown weights \(w :N \rightarrow \mathbb {R}_{\ge 0}\) that appear one-by-one in uniformly random order. Whenever an element appears, it reveals its weight and one has to immediately and irrevocably decide whether to select it. The goal is to select a set of elements \(I \subseteq N\) that (i) is independent, i.e., \(I \in \mathcal {I}\), and (ii) has weight \(w(I) = \sum _{e \in I} w(e)\) as large as possible. The key challenge in the area is to settle the notorious Matroid Secretary Problem (MSP) Conjecture:

Conjecture 1.1

([1]) There is an O(1)-competitive algorithm for MSP.

The best-known procedures for MSP are \(O(\log \log ({{\,\textrm{rank}\,}}(\mathcal {M})))\)-competitive [3, 4], where \({{\,\textrm{rank}\,}}(\mathcal {M})\) is the rank of the matroid \(\mathcal {M}\), i.e., the cardinality of a largest independent set.

Whereas the MSP Conjecture remains open, extensive work in the field has led to constant-competitive algorithms for variants of the problem and restricted settings. This includes constant-competitive algorithms for specific classes of matroids [5,6,7,8,9,10,11,12,13]. Moreover, in terms of natural variations of the problem, Soto [8] showed that constant-competitiveness is achievable in the so-called Random-Assignment MSP, or RA-MSP for short. Here, an adversary chooses \(|N|\) weights, which are then assigned uniformly at random to the ground set elements N of the matroid. (Soto’s result was later extended by Oveis Gharan and Vondrák [14] to the setting where the arrival order of the elements is adversarial instead of uniformly random.) Constant-competitive algorithms also exist for the Free Order Model, where the algorithm can choose the order in which elements appear [9].

Intriguingly, a key aspect of prior advances on constant-competitive algorithms for special cases and variants of MSP is that they heavily rely on either knowing the full matroid \(\mathcal {M}\) upfront or that revealing elements also reveal additional information about the matroid. This is also crucially exploited in Soto’s work on RA-MSP. In fact, if the matroid is not known upfront in full, there is no natural general variant of MSP (such as the Random Assignment and Free Order models mentioned above) for which a constant-competitive algorithm is known.

A high reliance on knowing the matroid \(\mathcal {M}= (N, \mathcal {I})\) upfront (except for its size \(|N|\)) is undesirable when trying to approach the MSP Conjecture, because it is easy to obstruct an MSP instance by adding zero-weight elements. Not surprisingly, all prior advances on the general MSP conjecture, like the above-mentioned \(O(\log \log ({{\,\textrm{rank}\,}}(\mathcal {M})))\)-competitive algorithms [3, 4] and also earlier procedures [1, 15], only need to know \(|N|\) upfront and make calls to an independence oracle on elements revealed so far. Thus, for RA-MSP, it was raised as an open question, both in [8] and [14], whether a constant-competitive algorithm exists without knowing the matroid upfront. The key contribution of this work is to affirmatively answer this question, making the random assignment setting the first MSP variant for which a constant-competitive algorithm is known without knowing the matroid and without any restriction on the underlying matroid.

Theorem 1.2

There is a constant-competitive algorithm for RA-MSP with only the cardinality of the matroid known upfront.

Moreover, our result holds in the more general adversarial order with a sample setting, where we are allowed to sample a random constant fraction of the elements and all remaining (non-sampled) elements arrive in adversarial order. This is also referred to as an order-oblivious algorithm in the MSP literature (see, e.g., [4]).

As mentioned, when the matroid is fully known upfront, an O(1)-competitive algorithm was known for RA-MSP even when the arrival order of all elements is adversarial [14]. Interestingly, for this setting it is known that, without knowing the matroid upfront, no constant-competitive algorithm exists. More precisely, a lower bound on the competitiveness of \(\Omega ({\log |N|}/{\log \log |N|})\) was shown in [14].

1.1 Organization of the paper

We start in Sect. 2 with a brief discussion on the role of (matroid) densities in the context of random assignment models, as our algorithm heavily relies on densities. Decomposing the matroid into parts of different densities has been central in prior advances on RA-MSP. However, this crucially relies on knowing the matroid upfront. We work with a rank-density curve, introduced in Sect. 3.1, which is also unknown upfront; however, we show that it can be learned approximately (in a well-defined sense) by observing a random constant fraction of the elements. Section 3 provides an outline of our approach based on rank-density curves and presents the main ingredients allowing us to derive Theorem 1.2. Section 4 takes a closer look at rank-density curves and shows some of their useful properties. Section 5 showcases the main technical tool that allows us to approximate the rank-density curve from a sample set. Finally, Sect. 6 combines the ingredients to present our final algorithm and its analysis.

We emphasize that we predominantly focus on providing a simple algorithm and analysis, refraining from optimizing the competitive ratio of our procedure at the cost of complicating the presentation.

We assume that all matroids are loopless, i.e., every element is independent by itself. This is without loss of generality, as loops can simply be ignored in matroid secretary problems.

2 Random-assignment MSP and densities

A main challenge in the design and analysis of MSP algorithms is how to protect heavier elements (or elements of an offline optimum) from being spanned by lighter ones that are selected earlier during the execution of the algorithm. In the random assignment setting, however, weights are assigned to elements uniformly at random, which allows for shifting the focus from protecting elements based on their weights to protecting elements based on their role in the matroid structure. Intuitively speaking, an element arriving in the future is at a higher risk of being spanned by the algorithm’s prior selection if it belongs to an area of the matroid of large cardinality and small rank (a “dense” area) than an area of small cardinality and large rank (a “sparse” area).

This is formally captured by the notion of density: the density of a set \(U \subseteq N\) in a matroid \(\mathcal {M}= (N, \mathcal {I})\) is \({|U|}/{r(U)}\), where \(r :2^{N} \rightarrow \mathbb {Z}_{\ge 0}\) is the rank function of \(\mathcal {M}\).Footnote 2

Densities play a crucial role in RA-MSP [8, 14]. Indeed, prior approaches decomposed \(\mathcal {M}\) into its principal sequence, which is the chain \(\emptyset \subsetneq S_{1} \subsetneq \ldots \subsetneq S_{k} = N\) of sets of decreasing densities obtained as follows. \(S_{1} \subseteq N\) is the densest set of \(\mathcal {M}\) (in case of ties it is the unique maximal densest set), \(S_{2}\) is the union of \(S_{1}\) and the densest set in the matroid obtained from \(\mathcal {M}\) after contracting \(S_{1}\), and so on until a set \(S_{k}\) is obtained with \(S_{k} = N\). Figure 1a shows an example of the principal sequence of a graphic matroid.

Fig. 1
figure 1

Fig. 1a shows a graph representing a graphic matroid together with its principal sequence \(\emptyset \subsetneq S_{1} \subsetneq \dots \subsetneq S_{7} = N\), where N are all edges of the graph. Figure 1b shows its rank-density curve. Each step in the rank-density curve (highlighted by a circle) corresponds to one \(S_{i}\) and has y-coordinate equal to the density of \(\mathcal {M}_{i} = \left. \left( \mathcal {M} / S_{i - 1}\right) \right| _{S_{i} {\setminus } S_{i - 1}}\) and x-coordinate equal to \(r(S_{i})\)

Previous approaches then considered, independently for each \(i \in [k] {:=}\{1, \dots , k\}\), the matroid \(\mathcal {M}_{i} {:=}\left. \left( \mathcal {M} / S_{i - 1}\right) \right| _{S_i{\setminus } S_{i - 1}}\), i.e., the matroid obtained from \(\mathcal {M}\) by first contracting \(S_{i - 1}\) and then restricting to \(S_{i} \setminus S_{i - 1}\). (By convention, we set \(S_{0} {:=}\emptyset \).) These matroids are also known as the principal minors of \(\mathcal {M}\). Given an independent set in each principal minor, their union is guaranteed to be independent in the original matroid \(\mathcal {M}\). Prior approaches (see, in particular, [8] for details) then exploited the following two key properties of the principal minors \(\mathcal {M}_{i}\):

  1. (i)

    \(\sum _{i = 1}^{k} \mathbb {E}[w({\textrm{OPT}}(\mathcal {M}_{i}))] = \Omega (\mathbb {E}[w({\textrm{OPT}}(\mathcal {M}))])\), where \({\textrm{OPT}}(\mathcal {M})\) (and analogously \({\textrm{OPT}}(\mathcal {M}_{i})\)) is an (offline) maximum weight independent set in \(\mathcal {M}\) and the expectation is over all random weight assignments.

  2. (ii)

    Each matroid \(\mathcal {M}_{i}\) is uniformly dense, which means that the (unique maximal) densest set in \(\mathcal {M}_{i}\) is the whole ground set of \(\mathcal {M}_{i}\).

Property (i) guarantees that, to obtain an O(1)-competitive procedure, it suffices to compare against the (offline) optima of the matroids \(\mathcal {M}_i\). Combining this with property (ii) implies that it suffices to design a constant-competitive algorithm for uniformly dense matroids. Since uniformly dense matroids behave in many ways very similarly to uniform matroids, which are a special case of uniformly dense matroids, it turns out that MSP on uniformly dense matroids admits a simple yet elegant O(1)-competitive algorithm. (See [8] for details.)

3 Outline of our approach

As discussed, prior approaches [8, 14] for RA-MSP heavily rely on knowing the matroid upfront, as they need to construct its principal sequence upfront. A natural approach would be to observe a sample set \(S \subseteq N\) containing a constant fraction of all elements and then try to mimic the existing approaches using the principal sequence of \(\left. \mathcal {M}\right| _{S}\), the matroid \(\mathcal {M}\) restricted to the elements in S. A main hurdle lies in how to analyze such a procedure as the principal sequence of \(\left. \mathcal {M}\right| _{S}\) can differ significantly from the one of \(\mathcal {M}\). In particular, one can construct matroids where it is likely that there are parts whose density is underestimated by a super-constant factor. Moreover, \(\left. \mathcal {M}\right| _{S}\) may have many different densities not present in \(\mathcal {M}\) (e.g., when \(\mathcal {M}\) is uniformly dense).

We overcome these issues by not dealing with principal sequences directly, but rather using what we call the rank-density curve of a matroid, which captures certain key parameters of the principal sequence. As we show, rank-density curves have three useful properties:

  1. (i)

    They provide a natural way to derive a quantity that both relates to the offline optimum and can be easily compared against to bound the competitiveness of our procedure.

  2. (ii)

    They can be learned approximately by observing a random \(\Omega (1)\)-fraction of N.

  3. (iii)

    Approximate rank-density curves can be used algorithmically to protect denser areas from sparser ones without having to know the matroid upfront.

Section 3.1 introduces rank-density curves and shows how they conveniently allow for deriving a quantity that compares against the offline optimum. Section 3.2 then discusses our results on approximately learning rank-density curves and how this can be exploited algorithmically.

3.1 Rank-density curves

Given a matroid \(\mathcal {M}= (N, \mathcal {I})\), one natural way to define its rank-density curve \(\rho _{\mathcal {M}} :\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) is through its principal minors \(\mathcal {M}_{1}, \dots , \mathcal {M}_{k}\), which are defined through the principal sequence \(\emptyset \subsetneq S_{1} \subsetneq \dots \subsetneq S_{k} = N\) as explained in Sect. 2. For a value \(t \in (0, {{\,\textrm{rank}\,}}(\mathcal {M})]\), let \(i_{t} \in [k]\) be the smallest index such that \(r(S_{i_{t}}) \ge t\). The value \(\rho _{\mathcal {M}}(t)\) is then given by the density of \(\mathcal {M}_{i_{t}}\). (See Fig. 1b for an example.) In addition, we set \(\rho _{\mathcal {M}}(t) = 0\) for any \(t > {{\,\textrm{rank}\,}}(\mathcal {M})\).

A formally equivalent way to define \(\rho _{\mathcal {M}}\), which is more convenient for what we do later, is as follows. For any \(S \subseteq N\) and \(\lambda \in \mathbb {R}_{\ge 0}\), we define

$$\begin{aligned} D_{\mathcal {M}}(S, \lambda ) \in \underset{U \subseteq S}{{{\,\textrm{argmax}\,}}}\left\{ |U| - \lambda r(U)\right\} \end{aligned}$$
(1)

to be the unique maximal maximizer of \(\max _{U \subseteq S}\{|U| - \lambda r(U)\}\). It is well-known that each set in the principal sequence \(S_{1}, \dots , S_{k}\) is nonempty and of the form \(D_{\mathcal {M}}(N, \lambda )\) for \(\lambda \in \mathbb {R}_{\ge 0}\). This leads to the following way to define the rank-density curve, which is the one we use in what follows.

Definition 3.1

(rank-density curve) Let \(\mathcal {M}= (N, \mathcal {I})\) be a matroid. Its rank-density curve \(\rho _{\mathcal {M}} :\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) is defined by

$$\begin{aligned} \rho _{\mathcal {M}}(t) {:=}{\left\{ \begin{array}{ll} \max \left\{ \lambda \in \mathbb {R}_{\ge 0} :r(D_{\mathcal {M}}(N, \lambda )) \ge t\right\} & \forall t \in (0, {{\,\textrm{rank}\,}}(\mathcal {M})], \\ 0 & \forall t > {{\,\textrm{rank}\,}}(\mathcal {M}). \end{array}\right. } \end{aligned}$$

When the matroid \(\mathcal {M}\) is clear from context, we also simply write \(\rho \) instead of \(\rho _{\mathcal {M}}\) for its rank-density curve and \(D(N, \lambda )\) instead of \(D_{\mathcal {M}}(N, \lambda )\). Note that \(\rho \) is piecewise constant, left-continuous, and non-increasing. (See Fig. 1b for an example.) If \(\mathcal {M}\) is a uniformly dense matroid with density \(\lambda \), we have \(\rho (t) = \lambda \) for \(t\in (0,{{\,\textrm{rank}\,}}(\mathcal {M})]\) and \(\rho (t) = 0\) for \(t\in ({{\,\textrm{rank}\,}}(\mathcal {M}),\infty )\).

We now expand on how \(\rho _{\mathcal {M}}\) is related to the expected offline optimum value \(\mathbb {E}[{\textrm{OPT}}(\mathcal {M})]\) of an RA-MSP instance. To this end, we use the function \(\eta :[0, |N|] \rightarrow \mathbb {R}_{\ge 0}\) defined by

$$\begin{aligned} \eta _w(a) {:=}\mathbb {E}_{R \sim {{\,\mathrm{{\textrm{Unif}}}\,}}(N, \lfloor a \rfloor )}\left[ \max _{e \in R} w(e)\right] , \end{aligned}$$
(2)

where \({{\,\mathrm{{\textrm{Unif}}}\,}}(N, \lfloor a \rfloor )\) is a uniformly random set of \(\lfloor a \rfloor \) many elements out of N (without repetitions); and we set \(\eta _w(a)=0\) for \(a \in [0,1)\) (i.e., when the set R above is empty) by convention. In words, \(\eta _w(a)\) is the expected maximum weight out of \(\lfloor a \rfloor \) weights chosen uniformly at random from all the weights \(\{w_{e}\}_{e \in N}\). Based on this notion, we assign the following value \(F_w(\rho )\) to a rank-density curve \(\rho \):

$$\begin{aligned} F_w(\rho ) {:=}\int _{0}^{\infty } \eta _w(\rho (t))\, dt = \int _{0}^{{{\,\textrm{rank}\,}}(\mathcal {M})} \eta _w(\rho (t))\, dt, \end{aligned}$$
(3)

where the second equality holds because \(\rho (t) = 0\) for \(t> {{\,\textrm{rank}\,}}(\mathcal {M})\). Note that as the graph of \(\rho \) is a staircase, the above integral is just a finite sum. When the weights w are clear from the context, we usually write \(\eta \) instead of \(\eta _w\), and F instead of \(F_w\).

One key property of the function F, is that it can be used as a proxy for the expected value of the offline optimum. More precisely, the statement below shows that \(F(\rho )\) is at most a constant factor smaller than the offline optimum—see Sect. 4 for proof details. This is the direction we need to be able to compare the output of our algorithm against the offline optimum. Moreover, \(F(\rho )\) is also no more than a constant-factor larger, which we do not need for our derivations; however, this is in particular also a consequence of the fact that our algorithm returns an independent set of expected weight \(\Omega (F(\rho ))\). Lemma 3.2 is phrased in a slightly more general form that allows for applying it not just to the original matroid, but also to any minors thereof, which we need later.

Lemma 3.2

Let \(w_1,\dots , w_n\in \mathbb {R}_{\ge 0}\) be n weights, and let \(\mathcal {M}\) be a matroid with a ground set of size \(k\le n\). Assume we first choose a uniformly random subset of k weights among \(w_1,\dots , w_n\), and then assign these weights uniformly at random to the elements of \(\mathcal {M}\). Then \(\mathbb {E}[w({\textrm{OPT}}(\mathcal {M}))] \le \frac{3e}{e - 1} \cdot F(\rho _{\mathcal {M}})\).

Thus, to be constant-competitive, it suffices to provide an algorithm returning an independent set of expected weight \(\Omega (F(\rho ))\).

3.1.1 RA-MSP subinstances

We will often work with minors of the matroid that is originally given in our RA-MSP instance, and apply certain results to such minors instead of the original matroid. To avoid confusion, we fix throughout the paper one RA-MSP instance with matroid \(\mathcal {M}_{\textrm{orig}} = (N_{\textrm{orig}}, \mathcal {I}_{\textrm{orig}})\), whose ground set size we denote by \(n{:=}|N_{\textrm{orig}}|\), and whose elements have unknown but (adversarially) fixed weights \(w :N_{\textrm{orig}} \rightarrow \mathbb {R}_{\ge 0}\), and our goal is to design an O(1)-competitive algorithm for this one instance. The weights w of the original instance are the only weights we consider, even when working with RA-MSP subinstances on minors of \(\mathcal {M}_{\textrm{orig}}\), as their elements also obtain their weights uniformly at random from w. In particular, the function F as defined in (3) is always defined with respect to the original vector of n weights w.

To formally describe the type of matroids we get as subinstances, we introduce the notion of a matroid with w-sampled weights. More precisely, if \(\mathcal {M}\) is a matroid with a ground set size of \(k\le n\), then \(\mathcal {M}\) with w-sampled weights is a randomly weighted version of the matroid, where we pick a uniform subset of k among the n entries of w and assign them uniformly at random to the ground set of \(\mathcal {M}\). Clearly, any minor of \(\mathcal {M}\) is of this type.

Even though we may have \(k<n\), a matroid \(\mathcal {M}\) with w-sampled weights can be interpreted as a RA-MSP instance, as it corresponds to the adversary first choosing uniformly at random a subset of k weights among the weights in w, which then get assigned uniformly at random to the elements.

3.2 Proof plan for Theorem 1.2 via rank-density curves

We now expand on how one can learn an approximation \(\tilde{\rho }\) of the rank-density curve \(\rho _{\mathcal {M}_{\textrm{orig}}}\) and how this can be exploited algorithmically to return an independent set of expected weight \(\Omega (F(\rho _{\mathcal {M}_{\textrm{orig}}}))\), which by Lemma 3.2 implies O(1)-competitiveness of the procedure. To this end, we start by formalizing the notion of an approximate rank-density curve, which relies on the notion of downshift.

Definition 3.3

Let \(\rho :\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) be a non-increasing function and let \(\alpha , \beta \in \mathbb {R}_{\ge 1}\). The \((\alpha , \beta )\)-downshift \(\rho ':\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) of \(\rho \) is defined via an auxiliary function \(\phi :\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) as follows:

$$\begin{aligned} \phi (t) {:=}{\left\{ \begin{array}{ll} \frac{\rho (\alpha )}{\beta } & \forall t \in (0, 1], \\ \frac{\rho (\alpha \cdot t)}{\beta } & \forall t > 1; \end{array}\right. } \quad \rho '(t) {:=}{\left\{ \begin{array}{ll} 1 & {\text { if }} \phi (t) \in (0, 1), \\ \phi (t) & {\text { otherwise }}. \end{array}\right. } \end{aligned}$$

Moreover, a function \(\tilde{\rho }:\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) is called an \((\alpha , \beta )\)-approximation of \(\rho \) if it is non-increasing and \(\rho '\le \tilde{\rho }\le \rho \), where \(\rho '\) is the \((\alpha , \beta )\)-downshift of \(\rho \).

One helpful way to think about an \((\alpha ,\beta )\)-downshift is as a slightly modified version of \({\rho (\alpha \cdot t)}/{\beta }\). This is also where the name stems from, as \({\rho (\alpha \cdot t)}/{\beta }\) corresponds to shifting, when thinking in doubly logarithmic scale, the function \(\rho \) to the left and down, corresponding to the factors \(\alpha \) and \(\beta \), correspondingly. This function is then modified in two ways. First, for \(t\in (0,1]\), we lower its value to \({\alpha }/{\beta }\). This is done because we are not able to accurately estimate densities for low ranks. Fortunately, this turns out not to be an issue to obtain a constant-competitive algorithm because we can set off the loss of this modification by running, with some probability, the classical single secretary algorithm to return the heaviest element with constant probability. (This is in particular implied by Lemma 3.4 below and discussed right after.) The second modification is that we round up values in (0, 1). This reflects the fact that density values are always at least one.

One issue when working with an (O(1), O(1))-approximation \(\tilde{\rho }\) of \(\rho \) is that \(F(\tilde{\rho })\) may be more than a constant factor smaller than \(F(\rho )\), and we thus cannot compare against \(F(\tilde{\rho })\) to obtain an O(1)-competitive procedure. This happens due to the above-mentioned way how \((\alpha ,\beta )\)-downshifts are defined; more precisely, that values for \(t\in (0,1]\) got rounded down to \({\rho (\alpha )}/{\beta }\). However, as the following lemma shows, also in this case we can obtain a simple lower bound for the value \(F(\tilde{\rho })\) in terms of \(F(\rho )\) and the largest weight \(w_{\max }\) in w — a proof of the statement can be found at the end of Sect. 4.

Lemma 3.4

Let \(\mathcal {M}\) be a matroid with w-sampled weights, let \(\alpha ,\beta \in \mathbb {R}_{\ge 1}\), and let \(\tilde{\rho }\) be an \((\alpha , \beta )\)-approximation of \(\rho =\rho _{\mathcal {M}}\). Then \(F(\rho ) \le 2 \alpha \beta F(\tilde{\rho }) + \alpha w_{\max }\).

A key implication of Lemma 3.4 is that it suffices to obtain an algorithm that returns an independent set of expected weight \(\Omega (F(\tilde{\rho }))\) for some (O(1), O(1))-approximation \(\tilde{\rho }\) of \(\rho _{\mathcal {M}_{\textrm{orig}}}\). Indeed, Lemma 3.4 then implies \(F(\tilde{\rho }) = \Omega (F(\rho _{\mathcal {M}_{\textrm{orig}}})) - O(w_{\max })\). By running this algorithm with some probability (say 0.5) and otherwise Dynkin’s [2] classical secretary algorithm, which picks the heaviest element with constant probability, an overall algorithm is obtained that returns an independent set of expected weight \(\Omega (F(\rho _{\mathcal {M}_{\textrm{orig}}}))\). Hence, Lemma 3.4 helps to provide bounds on the competitiveness of algorithms that are competitive with the F-value of an approximate rank-density curve. This technique is also used in the following key statement, which shows that an algorithm with strong guarantees can be obtained if we are given an (O(1), O(1))-approximation of the rank-density curve of the matroid on which we work—see Sect. 6 for the proof.

Theorem 3.5

Let \(\mathcal {M}\) be a matroid with w-sampled weights, and let \(\rho _{\mathcal {M}}\) denote the rank-density curve of \(\mathcal {M}\). Assume we are given an \((\alpha , \beta )\)-approximation \(\tilde{\rho }\) of \(\rho _{\mathcal {M}}\) for integers \(\alpha \ge 24\) and \(\beta \ge 3\). Then there is an efficient procedure \({\textrm{ALG}}(\tilde{\rho }, \alpha , \beta )\) that, when run on the RA-MSP subinstance given by \(\mathcal {M}\), returns an independent set I of \(\mathcal {M}\) of expected weight at least \(\left( \tfrac{1}{1440 e \alpha ^{2} \beta ^{2}}\right) \left( F(\rho _{\mathcal {M}})- \alpha ^{2} w_{\max } \right) \).

The last main ingredient of our approach is to show that such an accurate proxy \(\tilde{\rho }\) can be computed with constant probability. More precisely, we show that, after observing a sample set S containing every element of \(N_{\textrm{orig}}\) independently with probability \({1}/{2}\), the rank-density curve of (the observed) \(\left. \mathcal {M}_{\textrm{orig}}\right| _{S}\)

  • Is close to the rank-density curve of \(\left. \mathcal {M}_{\textrm{orig}}\right| _{N_{\textrm{orig}} \setminus S}\), allowing us to use \(\rho _{\left. \mathcal {M}_{\textrm{orig}}\right| _{S}}\) as desired proxy for the RA-MSP subinstance given by \(\left. \mathcal {M}_{\textrm{orig}}\right| _{N_{\textrm{orig}} \setminus S}\), and

  • Is close to the rank-density curve of \(\mathcal {M}_{\textrm{orig}}\), which allows for relating the offline optimum of the RA-MSP subinstance given by \(\left. \mathcal {M}_{\textrm{orig}}\right| _{N_{\textrm{orig}} \setminus S}\) to the one of \(\mathcal {M}_{\textrm{orig}}\).

We highlight that the next result is purely structural and hence independent of weights or the MSP setting. See Sect. 5 for details.

Theorem 3.6

Let \(\mathcal {M}=(N, \mathcal {I})\) be a matroid and \(S \subseteq N\) be a random set containing every element of N independently with probability \({1}/{2}\). Then, with probability at least \({1}/{100}\), \(\rho _{\left. \mathcal {M}\right| _{S}}\) and \(\rho _{\left. \mathcal {M}\right| _{N {\setminus } S}}\) are both (288, 9)-approximations of \(\rho _{\mathcal {M}}\).

Combining the above results, we get the desired O(1)-competitive algorithm.

Proof of Theorem 1.2

For brevity, let \(\mathcal {M}{:=}\mathcal {M}_{\textrm{orig}}\) and \(N {:=}N_{\textrm{orig}}\) throughout this proof. Recall that by Lemma 3.2, it suffices to provide an algorithm returning an independent set of expected weight \(\Omega (F(\rho _{\mathcal {M}}))\). Consider the following procedure: First observe (without picking any element) a set \(S \subseteq N\) containing every element of N independently with probability \({1}/{2}\) and let \(\tilde{\rho }\) denote the (288, 9)-downshift of \(\rho _{\left. \mathcal {M}\right| _{S}}\). Then run the algorithm described in Theorem 3.5 on \(\left. \mathcal {M}\right| _{N {\setminus } S}\) with \(\tilde{\rho }\) as the approximate rank-density curve. Let I denote the output of the above procedure and let \(\mathcal {A}\) be the event defined in Theorem 3.6, that is,

$$ \mathcal {A} = \{S \subseteq N :\rho _{\left. \mathcal {M}\right| _{S}} {\text { and }} \rho _{\left. \mathcal {M}\right| _{N{\setminus } S}} {\text { are }} (288, 9){\text {-approximations of }} \rho _{\mathcal {M}}\}. $$

A key property we exploit is that for any \(S\in \mathcal {A}\), we have that \(\tilde{\rho }\) is a \((288^2, 9^2)\)-approximation of \(\rho _{\left. \mathcal {M}\right| _{N\setminus S}}\) due to the following. First, because \(\tilde{\rho }\) is the (288, 9)-downshift of \(\rho _{\left. \mathcal {M}\right| _{S}} \le \rho _{\mathcal {M}}\), and \(\rho _{\left. \mathcal {M}\right| _{N \setminus S}}\) is a (288, 9)-approximations of \(\rho _{\mathcal {M}}\), we have \(\rho _{\left. \mathcal {M}\right| _{N \setminus S}} \ge \tilde{\rho }\). Moreover, the approximation parameter \((288^{2}, 9^{2})\) follows by using that the \((\alpha _{2}, \beta _{2})\)-downshift of the \((\alpha _{1}, \beta _{1})\)-downshift of some rank-density function is an \((\alpha _{1} \alpha _{2}, \beta _{1} \beta _{2})\)-approximation of that rank-density function — see Lemma 4.3 for a proof of this property. This property can be applied as follows. Let \(\bar{\rho }\) be the (288, 9)-downshift of \(\rho _{\left. \mathcal {M}\right| _{N\setminus S}}\le \rho _{\mathcal {M}}\). Because \(\rho _{\left. \mathcal {M}\right| _{S}}\) is a (288, 9)-approximation of \(\rho _{\mathcal {M}}\), we have \(\bar{\rho }\le \rho _{\left. \mathcal {M}\right| _{S}}\). Hence, because \(\tilde{\rho }\) is the (288, 9)-downshift of \(\rho _{\left. \mathcal {M}\right| _{S}}\), it lies above the (288, 9)-downshift of \(\bar{\rho }\), which is itself the (288, 9)-downshift of \(\rho _{\left. \mathcal {M}\right| _{N\setminus S}}\). Thus, by the above property, we obtain that \(\tilde{\rho }\) is a \((288^2,9^2)\)-approximation of \(\rho _{\left. \mathcal {M}\right| _{N{\setminus } S}}\) as claimed.

Using this fact, we obtain for any fixed \(S\in \mathcal {A}\)

$$\begin{aligned} \mathbb {E}[w(I) \mid S]&\ge \left( \frac{1}{1440 e \cdot 288^{4} \cdot 9^{4}}\right) \left( F \left( \rho _{\left. \mathcal {M}\right| _{N\setminus S}} \right) - 288^{4} w_{\max } \right) \\&\ge \left( \frac{1}{1440 e \cdot 288^{4} \cdot 9^{4}}\right) \left( \frac{1}{2\cdot 288\cdot 9}(F ( \rho _{\mathcal {M}} ) - 288 w_{\max }) - 288^{4} w_{\max } \right) \\&\ge \left( \frac{1}{1440 e \cdot 288^{4} \cdot 9^{4}}\right) \left( \frac{1}{2\cdot 288\cdot 9}F ( \rho _{\mathcal {M}} ) - 2\cdot 288^{4} w_{\max } \right) \\&= \left( \frac{1}{2880 e \cdot 288^{5} \cdot 9^{5}}\right) F ( \rho _{\mathcal {M}} ) - \frac{w_{\max }}{720 e \cdot 9^4}, \end{aligned}$$

where the first inequality follows from Theorem 3.5 and the fact that \(\tilde{\rho }\) is a \((288^2,9^2)\)-approximation of \(\rho _{\left. \mathcal {M}\right| _{N{\setminus } S}}\) as discussion above, while the second inequality follows from Lemma 3.4 and the fact that, for every \(S \in \mathcal {A}\), the curve \(\rho _{\left. \mathcal {M}\right| _{N \setminus S}}\) is a (288, 9)-approximation of \(\rho _{\mathcal {M}}\). Moreover, the first inequality uses that conditioning on any fixed \(S \in \mathcal {A}\) does not have any impact on the uniform assignment of the weights w to the elements. This holds because the event \(\mathcal {A}\) only depends on the sampled elements S but not the weights of its elements. Hence, the RA-MSP subinstance given by \(\left. \mathcal {M}\right| _{N\setminus S}\) on which we use the algorithm described in Theorem 3.5 indeed assigns weights of w uniformly at random to elements, as required. It then follows that the output of the above procedure satisfies

$$\begin{aligned} \mathbb {E}[w(I)]&\ge \sum _{S \in \mathcal {A}} \mathbb {E}[w(I) \mid S] \Pr [S] \ge \tfrac{1}{100} \left( \left( \tfrac{1}{2880 e \cdot 288^{5} \cdot 9^{5}}\right) F ( \rho _{\mathcal {M}} ) - \tfrac{w_{\max }}{720e \cdot 9^4} \right) , \end{aligned}$$

where the last inequality uses that \(\Pr [\mathcal {A}] \ge {1}/{100}\) by Theorem 3.6.

Since running the classical secretary algorithm on \(\mathcal {M}_{\textrm{orig}}\) returns an independent set of expected weight at least \({w_{\max }}/{e}\), by running the procedure described above with probability \({1}/{2}\), and running the classical secretary algorithm otherwise, we return an independent set of expected weight at least

$$\begin{aligned}&\frac{1}{2}\left[ \frac{w_{\max }}{e} + \frac{1}{100} \left( \left( \frac{1}{2880 e \cdot 288^{5} \cdot 9^{5}}\right) F( \rho _{\mathcal {M}} ) - \frac{w_{\max }}{720e \cdot 9^4} \right) \right] \\&\quad = \frac{1}{2}\left[ \left( \frac{1}{e} - \frac{1}{100 \cdot 720e \cdot 9^4} \right) w_{\max } + \left( \frac{1}{100 \cdot 2880 e \cdot 288^{5} \cdot 9^{5}}\right) F ( \rho _{\mathcal {M}} ) \right] \\&\quad \ge \left( \frac{1}{2 \cdot 100 \cdot 2880 e \cdot 288^{5} \cdot 9^{5}}\right) F ( \rho _{\mathcal {M}} ) \ge \left( \frac{1}{3 \cdot 100 \cdot 3000 \cdot 3 \cdot 300^{5} \cdot 9^{5}}\right) F ( \rho _{\mathcal {M}} )\\&\quad = \left( \frac{1}{9^9 \cdot 10^{15}}\right) F \left( \rho _{\mathcal {M}} \right) \ge \left( \frac{1}{9^{10} \cdot 10^{15}}\right) \mathbb {E}[w({\textrm{OPT}}(\mathcal {M}))], \end{aligned}$$

where the last inequality follows from Lemma 3.2. \(\square \)

4 Rank-density curves and their properties

In this section we take a closer look at rank-density curves, and prove Lemmas 3.2 and 3.4. We start by stating some useful properties of the function \(\eta \).

Lemma 4.1

Let \(\eta \) be as defined in (2). Then

  1. (i)

    \(\eta \) is non-decreasing.

  2. (ii)

    \(\eta (ah) \le 2a \eta (h)\) for all \(a \ge 1\) and \(h \in [1, \frac{n}{a}]\).

  3. (iii)

    Let \(X \sim B(m, p)\) with \(1 \le mp \le m \le n\). Then \(\mathbb {E}[\eta (X)] \le 3 \eta (mp)\).

Proof

  1. (i)

    This follows immediately from the definition of \(\eta \).

  2. (ii)

    Consider a consecutive numbering of the n entries of the weight vector \(w\in \mathbb {R}^n\) in non-decreasing order, i.e., \(w_1 \le w_2 \le \cdots \le w_n\). For each \(k, i \in [n]\) let p(ki) denote the probability that among k samples, element i is the heaviest and has the smallest index among the heaviest elements (in case of ties). In other words, p(ki) is the probability that \(w_{i}\) is the heaviest weight out of k sampled ones and none of \(w_{j}\) with \(j < i\) are in the sample. Thus for \(i < k\) we have \(p(k, i) = 0\) and for \(i \ge k\) we have

    $$ p(k, i) = \frac{\left( {\begin{array}{c}i - 1\\ k - 1\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } = k \frac{(i - 1)! (n - k)!}{n! (i - k)!} = \frac{k}{n} \prod _{j = 1}^{k - 1} \frac{i - j}{n - j}. $$

    Next, note that for all \(h \in [1, \frac{n}{a}]\) we have

    $$\begin{aligned} \eta (h)&= \sum _{i = 1}^{n} p(\lfloor h \rfloor , i) w_{i}, {\text { and}}\\ \eta (ah)&= \sum _{i = 1}^{n} p(\lfloor ah \rfloor , i) w_{i}. \end{aligned}$$

    Our goal is to show that \(p(\lfloor ah \rfloor , i) \le 2 a p(\lfloor h \rfloor , i)\) for all \(i \in [n]\), which is sufficient to prove the desired inequality. To this end, note that for \(i < \lfloor ah \rfloor \) we have \(p(\lfloor ah \rfloor , i) = 0 \le a p(\lfloor h \rfloor , i)\), and for \(i \ge \lfloor ah \rfloor \) we have

    $$\begin{aligned} p(\lfloor ah \rfloor , i)&= \frac{\lfloor ah \rfloor }{n} \prod _{j = 1}^{\lfloor ah \rfloor - 1} \frac{i - j}{n - j} \le \frac{\lfloor ah \rfloor }{n} \prod _{j = 1}^{\lfloor h \rfloor - 1} \frac{i - j}{n - j} \\&\le 2a \cdot \frac{\lfloor h \rfloor }{n} \prod _{j = 1}^{\lfloor h \rfloor - 1} \frac{i - j}{n - j} = 2a p(\lfloor h \rfloor , i). \end{aligned}$$

    Here the first inequality follows from \(\lfloor h \rfloor \le \lfloor ah \rfloor \) and the fact that \(\frac{i - j}{n - j} \le 1\) for any \(j < n\), and the second inequality uses that \(\lfloor ah \rfloor \le ah \le 2a \lfloor h \rfloor \) because \(h \le 2 \lfloor h \rfloor \) for \(h \in \mathbb {R}_{\ge 1}\).

  3. (iii)

    Note that \(\eta (h) \le {2\,h}/{mp} \cdot \eta (mp)\) for all \(h \in [mp, n]\) by property (ii). Therefore,

    $$\begin{aligned} \mathbb {E}\left[ \eta (X)\right]&= \sum _{h = 0}^{\lfloor mp \rfloor } \eta (h) \cdot \Pr \left[ X = h\right] + \sum _{h = \lfloor mp \rfloor + 1}^{m} \eta (h) \cdot \Pr \left[ X = h\right] \\&\le \eta (mp) \cdot \Pr \left[ X \le \lfloor mp \rfloor \right] + \sum _{h = \lfloor mp \rfloor + 1}^{m} 2 \frac{h}{mp} \cdot \eta (mp) \cdot \Pr \left[ X = h\right] \\&= \eta (mp) \cdot \Pr \left[ X \le \lfloor mp \rfloor \right] + 2 \frac{\eta (mp)}{mp} \cdot \sum _{h = \lfloor mp \rfloor + 1}^{m} h \cdot \Pr \left[ X = h\right] \\&\le \eta (mp) + 2 \frac{\eta (mp)}{mp} \cdot \mathbb {E}\left[ X\right] = 3\eta (mp). \end{aligned}$$

\(\square \)

In order to prove Lemma 3.2, we use the following result from [8], which relies on the notion of the random partition matroid \({\mathcal {P}} = (N, \mathcal {I}')\) associated to a given matroid \(\mathcal {M}= (N, \mathcal {I})\). The former is constructed as follows. First, every element \(e \in N\) is assigned to one of \(\texttt{rank}(\mathcal {M})\) many classes \(P_{1}, \dots , P_{\texttt{rank}(\mathcal {M})}\) uniformly at random and independently of each other. A set \(S \subseteq N\) is then independent in \({\mathcal {P}}\) if it contains at most one element from each class, i.e., \(S \in \mathcal {I}'\) if \(|S \cap P_{i}| \le 1\) for every \(i \in [{\texttt {rank} }(\mathcal {M})]\). The next result relates the value of the offline OPT of \(\mathcal {M}\), with that of the random partition matroids associated to the principal minors of \(\mathcal {M}\).

Lemma 4.2

([8, Lemma 4.2]) Let \((\mathcal {M}, w)\) be a random-assignment MSP instance. Let \(\{\mathcal {M}_{i}\}_{i = 1}^{k}\) be the principal minors of \(\mathcal {M}\) and let \({\mathcal {P}}_{i}\) denote the random partition matroid associated to each \(\mathcal {M}_{i}\), respectively. Then

$$ \mathbb {E}\left[ w({\textrm{OPT}}(\mathcal {M}))\right] \le \frac{e}{e - 1} \mathbb {E}\left[ w({\textrm{OPT}}(\oplus _{i = 1}^{k} {\mathcal {P}}_{i}))\right] , $$

where the expectations are taken with respect to the random weight assignment and the random partitioning (this only applies to the right-hand side expectation).

Now we are ready to prove Lemma 3.2.

Proof of Lemma 3.2

Let \((\mathcal {M}'_{i})_{i = 1}^{k}\) be the principal minors of \(\mathcal {M}'\) and let \(n'_{i}\) and \(r'_{i}\) denote the cardinality and rank of each \(\mathcal {M}'_{i}\), respectively. Additionally, for every \(i \in [k]\), let \({\mathcal {P}}'_{i}\) be the random partition matroid associated to \(\mathcal {M}'_{i}\) and let \((P'_{i, j})_{j = 1}^{r'_{i}}\) denote the (random) partitions of \({\mathcal {P}}'_{i}\). By Lemma 4.2 we have

$$ \mathbb {E}\left[ w({\textrm{OPT}}(\mathcal {M}'))\right] \le \frac{e}{e - 1} \mathbb {E}\left[ w({\textrm{OPT}}(\oplus _{i = 1}^{k} {\mathcal {P}}'_{i}))\right] = \frac{e}{e - 1} \sum _{i = 1}^{k} \mathbb {E}\left[ w({\textrm{OPT}}({\mathcal {P}}'_{i}))\right] . $$

Next, note that for every \(i \in [k]\) we have

$$\begin{aligned} \mathbb {E}[w({\textrm{OPT}}({\mathcal {P}}'_{i}))]&= \sum _{j = 1}^{r'_{i}} \mathbb {E}\left[ w({\textrm{OPT}}(\left. {\mathcal {P}}'_{i}\right| _{P'_{i, j}}))\right] \\ &= \sum _{j = 1}^{r'_{i}} \mathbb {E}\left[ \eta _{w}(|P'_{i, j}|)\right] = r'_{i} \cdot \mathbb {E}_{X \sim B(n'_{i}, \, 1 / r'_{i})} [\eta _{w}(X)], \end{aligned}$$

where the first equality follows from the definition of \({\mathcal {P}}'_{i}\) and linearity of expectation, the second one uses the definition of \(\eta _{w}\), and the third one holds since \(|P'_{i, j}| \sim B(n'_{i}, 1 / r'_{i})\) for every \(j \in [r'_{i}]\). Thus, by applying Lemma 4.1 and using that by construction every \(\mathcal {M}'_{i}\) is uniformly dense with density \(\lambda '_{i} = n'_{i} / r'_{i}\), we get:

$$ \mathbb {E}[w({\textrm{OPT}}({\mathcal {P}}'_{i}))] \le r'_{i} \cdot 3\eta (n'_{i} / r'_{i}) = 3 r'_{i} \eta (\lambda '_{i}). $$

Hence

$$ \mathbb {E}\left[ w({\textrm{OPT}}(\mathcal {M}'))\right] \le \frac{3e}{e - 1} \sum _{i = 1}^{k} r'_{i} \eta (\lambda '_{i}) = \frac{3e}{e - 1} F(\rho _{\mathcal {M}'}), $$

where the equality holds by definition of \(F_{w}\). \(\square \)

The following result shows that, simply put, in terms of Definition 3.3, an approximation of an approximation of a function is also an approximation of the original function, where the approximation parameters \(\alpha \) and \(\beta \) are multiplied.

Lemma 4.3

Let \(\rho _{1}, \rho _{2}, \rho _{3} :\mathbb {R}_{> 0} \rightarrow \mathbb {R}_{\ge 0}\) be non-increasing functions such that \(\rho _{2}\) is an \((\alpha _{1}, \beta _{1})\)-approximation of \(\rho _{1}\) and \(\rho _{3}\) is an \((\alpha _{2}, \beta _{2})\)-approximation of \(\rho _{2}\) for some parameters \(\alpha _{1}, \beta _{1}, \alpha _{2}, \beta _{2} \in \mathbb {R}_{\ge 1}\). Then \(\rho _{3}\) is an \((\alpha _{1} \alpha _{2}, \beta _{1} \beta _{2})\)-approximation of \(\rho _{1}\).

Proof

Let \(\rho _{1}'\) be the \((\alpha _{1}, \beta _{1})\)-downshift of \(\rho _{1}\) and let \(\phi _{1}\) be the auxiliary function used to construct \(\rho _{1}'\) in Definition 3.3. Similarly, let \(\rho _{2}'\) and \(\phi _{2}\) be the \((\alpha _{2}, \beta _{2})\)-downshift of \(\rho _{2}\) and the corresponding auxiliary function, respectively, and let \(\rho _{1}''\) and \(\phi _{1}'\) be the \((\alpha _{1} \alpha _{2}, \beta _{1} \beta _{2})\)-downshift of \(\rho _{1}\) and the corresponding auxiliary function, respectively.

Since \(\rho _{3} \le \rho _{2} \le \rho _{1}\), it only remains to show that \(\rho _{3} \ge \rho _{1}''\). Observe that to obtain this bound it suffices to prove the following two properties (for a function g taking real values, we denote by \({{\,\textrm{supp}\,}}(g)\) the non-zero support of g, that is all points t in its domain for which \(g(t) \ne 0\)):

  1. (i)

    \(\rho _{3}(t) \ge \phi _{1}'(t)\) for every \(t > 0\), and

  2. (ii)

    \(\rho _{3}(t) \ge 1\) for every \(t \in {{\,\textrm{supp}\,}}(\rho _{1}'')\).

Indeed, on the one hand, for every \(t > 0\) such that \(\phi _{1}'(t) = \rho _{1}''(t)\) the first property implies \(\rho _{3}(t) \ge \rho _{1}''(t)\). On the other hand, for every \(t > 0\) such that \(\phi _{1}'(t) \ne \rho _{1}''(t)\) we have \(\rho _{1}''(t) = 1\), so \(\rho _{3}(t) \ge \rho _{1}''(t)\) follows from the second property.

First, we prove property (i). To this end, observe that for every \(t > 0\) we have \(\rho _{3}(t) \ge \rho _{2}'(t) \ge \phi _{2}(t)\) and \(\rho _{2}(t) \ge \rho _{1}'(t) \ge \phi _{1}(t)\). Therefore,

$$\begin{aligned} \rho _{3}(t)&\ge \phi _{2}(t) = {\left\{ \begin{array}{ll} \frac{\rho _{2}(\alpha _{2})}{\beta _{2}} & \forall t \in (0, 1), \\ \frac{\rho _{2}(\alpha _{2} t)}{\beta _{2}} & \forall t \ge 1 \end{array}\right. } \\&\ge {\left\{ \begin{array}{ll} \frac{\phi _{1}(\alpha _{2})}{\beta _{2}} & \forall t \in (0, 1), \\ \frac{\phi _{1}(\alpha _{2} t)}{\beta _{2}} & \forall t \ge 1 \end{array}\right. } = {\left\{ \begin{array}{ll} \frac{\rho _{1}(\alpha _{1} \alpha _{2})}{\beta _{1} \beta _{2}} & \forall t \in (0, 1), \\ \frac{\rho _{1}(\alpha _{1} \alpha _{2} t)}{\beta _{1} \beta _{2}} & \forall t \ge 1 \end{array}\right. } = \phi _{1}'(t). \end{aligned}$$

Here the first and second inequalities hold by the above observation, the first and the last equalities hold by construction of \(\phi _{2}\) and \(\phi _{1}'\), respectively, and the second equality holds by construction of \(\phi _{1}\) and the fact that \(\alpha _{2} \ge 1\).

Now, let us show property (ii). First, note that for every \(t \in {{\,\textrm{supp}\,}}(\rho _{2}')\) by construction we have \(\rho _{3}(t) \ge \rho _{2}'(t) \ge 1\). Next, note that by construction

$$ {{\,\textrm{supp}\,}}(\rho _{2}') = {{\,\textrm{supp}\,}}(\phi _{2}) = {\left\{ \begin{array}{ll} \emptyset & {\text { if }} \alpha _{2} \notin {{\,\textrm{supp}\,}}(\rho _{2}), \\ \{t > 0 :\alpha _{2} t \in {{\,\textrm{supp}\,}}(\rho _{2})\} & {\text { otherwise }}. \end{array}\right. } $$

Since \({{\,\textrm{supp}\,}}(\rho _{2}) \supseteq {{\,\textrm{supp}\,}}(\rho _{1}')\), this implies

$$ {{\,\textrm{supp}\,}}(\rho _{2}') \supseteq {\left\{ \begin{array}{ll} \emptyset & {\text { if }} \alpha _{2} \notin {{\,\textrm{supp}\,}}(\rho _{1}'), \\ \{t > 0 :\alpha _{2} t \in {{\,\textrm{supp}\,}}(\rho _{1}')\} & {\text { otherwise }}. \end{array}\right. } $$

Similarly, note that

$$ {{\,\textrm{supp}\,}}(\rho _{1}') = {{\,\textrm{supp}\,}}(\phi _{1}) = {\left\{ \begin{array}{ll} \emptyset & {\text { if }} \alpha _{1} \notin {{\,\textrm{supp}\,}}(\rho _{1}), \\ \{t > 0 :\alpha _{1} t \in {{\,\textrm{supp}\,}}(\rho _{1})\} & {\text { otherwise }}. \end{array}\right. } $$

Thus, the condition \(\alpha _{2} \notin {{\,\textrm{supp}\,}}(\rho _{1}')\) is equivalent to the following: either \(\alpha _{1} \notin {{\,\textrm{supp}\,}}(\rho _{1})\) or \(\alpha _{1} \alpha _{2} t \notin {{\,\textrm{supp}\,}}(\rho _{1})\). Note that the latter condition is more restrictive, as \(\alpha _{2} \ge 1\). Therefore, we get:

$$ {{\,\textrm{supp}\,}}(\rho _{2}') \supseteq {\left\{ \begin{array}{ll} \emptyset & {\text { if }} \alpha _{1} \alpha _{2} \notin {{\,\textrm{supp}\,}}(\rho _{1}), \\ \{t > 0 :\alpha _{1} \alpha _{2} t \in {{\,\textrm{supp}\,}}(\rho _{1})\} & {\text { otherwise }} \end{array}\right. } = {{\,\textrm{supp}\,}}(\rho _{1}''), $$

where the equality follows by construction of \({{\,\textrm{supp}\,}}(\rho _{1}'')\). Thus, \(\rho _{3}(t) \ge 1\) holds for every \(t \in {{\,\textrm{supp}\,}}(\rho _{2}') \supseteq {{\,\textrm{supp}\,}}(\rho _{1}'')\), so property (ii) holds. \(\square \)

We conclude this section by providing a proof of Lemma 3.4.

Proof of Lemma 3.4

First observe that the statement trivially holds if we have \(\alpha \ge {{\,\textrm{rank}\,}}(\mathcal {M})\), because \(F(\rho ) \le \alpha w_{\max }\). Hence, in what follows we assume \(\alpha < {{\,\textrm{rank}\,}}(\mathcal {M})\).

Let \(\rho ':\mathbb {R}_{>0} \rightarrow \mathbb {R}_{\ge 0}\) be the \((\alpha ,\beta )\)-downshift of \(\rho \). By definition of the function F and because \(\tilde{\rho } \ge \rho '\), we get

$$\begin{aligned} F(\tilde{\rho }) = \int _0^{\infty } \eta (\tilde{\rho }(t))\,dt \ge \int _0^{\infty } \eta (\rho '(t))\,dt = \int _0^{\frac{{{\,\textrm{rank}\,}}(\mathcal {M})}{\alpha }} \eta (\rho '(t))\,dt, \end{aligned}$$
(4)

where the equality at the end follows from the fact that \(\rho '(t)=0\) for \(t > {{{\,\textrm{rank}\,}}(\mathcal {M})}/{\alpha }\). We now distinguish between \(\beta \ge n\) and \(\beta < n\).

Consider first the case \(\beta \ge n\). In this case we can expand as follows:

$$\begin{aligned} \int _0^{\frac{{{\,\textrm{rank}\,}}(\mathcal {M})}{\alpha }} \eta (\rho '(t))\,dt&\ge \frac{{{\,\textrm{rank}\,}}(\mathcal {M})}{\alpha } \cdot \eta (1) \ge \frac{1}{2\alpha \beta } {{\,\textrm{rank}\,}}(\mathcal {M}) \cdot 2n \eta (1) \\&\ge \frac{1}{2\alpha \beta } {{\,\textrm{rank}\,}}(\mathcal {M})\cdot \eta (n)\\&= \frac{1}{2\alpha \beta } {{\,\textrm{rank}\,}}(\mathcal {M})\cdot w_{\max } \ge \frac{1}{2\alpha \beta } F(\rho ), \end{aligned}$$

where the first inequality holds because \(\rho '\) is at least 1 when it is non-zero and \(\rho '\) is non-zero within \([0,{{{\,\textrm{rank}\,}}(\mathcal {M})}/{\alpha }]\) because \(\alpha \le {{\,\textrm{rank}\,}}(\mathcal {M})\), the second inequality uses \(\beta \ge n\), the third one is due to \(2n\eta (1) \ge \eta (n)\)—which is a consequence of Lemma 4.1 (ii)—the equality thereafter uses \(\eta (n)=w_{\max }\) and the final inequality follows from \({{\,\textrm{rank}\,}}(\mathcal {M})\cdot w_{\max } \ge F(\rho )\). The above relation together with Eq. (4) implies the desired result when \(\beta \ge n\).

Now assume \(\beta < n\). In this case we continue as follows:

$$\begin{aligned} \int _0^{\frac{{{\,\textrm{rank}\,}}(\mathcal {M})}{\alpha }} \eta (\rho '(t))\,dt&\ge \int _1^{\frac{{{\,\textrm{rank}\,}}(\mathcal {M})}{\alpha }} \eta (\rho '(t))\,dt\\&= \int _1^{\frac{{{\,\textrm{rank}\,}}(\mathcal {M})}{\alpha }} \eta \left( \max \left\{ \frac{\rho (\alpha t)}{\beta },1\right\} \right) \,dt\\&= \frac{1}{\alpha } \int _{\alpha }^{{{\,\textrm{rank}\,}}(\mathcal {M})} \eta \left( \max \left\{ \frac{\rho (t)}{\beta },1\right\} \right) \,dt\\&\ge \frac{1}{2 \alpha \beta } \int _{\alpha }^{{{\,\textrm{rank}\,}}(\mathcal {M})} \eta \left( \max \left\{ \rho (t),\beta \right\} \right) \,dt\\&\ge \frac{1}{2 \alpha \beta } \int _{\alpha }^{{{\,\textrm{rank}\,}}(\mathcal {M})} \eta (\rho (t))\,dt\\&= \frac{1}{2 \alpha \beta } \left( \int _{0}^{{{\,\textrm{rank}\,}}(\mathcal {M})} \eta (\rho (t))\,dt - \int _0^{\alpha } \eta (\rho (t))\,dt\right) \\&\ge \frac{1}{2 \alpha \beta } (F(\rho ) - \alpha w_{\max }), \end{aligned}$$

where the first equality is a consequence of \(\rho '\) being the \((\alpha ,\beta )\)-downshift of \(\rho \), the second equality follows from a variable substitution, the second inequality uses \(2\beta \eta (\max \{{\rho (t)}/{\beta },1\}) \ge \eta (\max \{\rho (t),\beta \})\), which holds due to Lemma 4.1 (ii) (here we use \(\beta \le n\) to fulfill the conditions of this statement), and in the last inequality we use the definition of F and the fact that the function \(\eta \) never takes values larger than \(w_{\max }\). The above relation together with Eq. (4) implies the desired result for \(\beta \le n\), which finishes the proof. \(\square \)

5 Learning rank-density curves from a sample

One of the main challenges when designing and analyzing algorithms for MSP is understanding what kind of (and how much) information can be learned about the underlying instance after observing a random sample of it.

In this section, we show that, with constant probability, after observing a sample set S containing each element with probability 0.5, one can learn a good approximation of the rank-density curve of both \(\mathcal {M}\) and \(\left. \mathcal {M}\right| _{N \setminus S}\), thus proving Theorem 3.6. However, even if one knew the exact (instead of an approximate) rank-density curve of \(\left. \mathcal {M}\right| _{N \setminus S}\), given that the matroid is not known upfront (and hence neither which elements are associated to each of the different density areas of the curve), it is a priori not clear how to proceed. A second main contribution of this section is to show that the set of elements in \(N \setminus S\) that are spanned by a subset of S of a given density is well-structured. In particular, this will allow us to build a (chain) decomposition \(\bigoplus _{i = 1}^{k} \mathcal {M}_{i}\) of \(\left. \mathcal {M}\right| _{N {\setminus } S}\) where all the \(\mathcal {M}_{i}\)’s satisfy some desired properties with constant probability — see Sect. 6.1 for details.

The main technical contribution in this section is the following result.

Theorem 5.1

Let \(\mathcal {M}= (N, \mathcal {I})\) be a matroid containing 3h disjoint bases for some \(h \in \mathbb {Z}_{\ge 1}\). Let \(S \sim B(N, {1}/{2})\). Then

$$\begin{aligned} \Pr \left[ |{\textrm{span}}(D(S, h)) \setminus S| \le \frac{|N|}{12}\right]&\le \texttt{exp}\left( -\frac{|N|}{144}\right) , \end{aligned}$$
(5)
$$\begin{aligned} \Pr \left[ r(D(S, h)) \le \frac{r(N)}{8}\right]&\le \texttt{exp}\left( -\frac{r(N)}{48}\right) . \end{aligned}$$
(6)

Proof

We prove the concentration result (5) first. Let \(\mathcal {M}_{h} = (N, \mathcal {I}_{h})\) denote the h-fold union of \(\mathcal {M}\) and let \(r_{h}\) denote its rank function. Consider the procedure described in Algorithm 1, which is loosely inspired by [16].

figure a

Note that the following three properties hold at all times: W, G, and C are pairwise disjoint; \(W \subseteq S\) and \(C \subseteq S\), while \(G \cap S = \emptyset \); and \(W \in \mathcal {I}_{h}\). In addition, by construction, at the end of the procedure we have:

  1. (i)

    \(S = C \uplus W\). Moreover, the random sets G and C have identical distributions, because each element belongs to S with probability \({1}/{2}\) independently of the other elements.

  2. (ii)

    \(G \subseteq {\textrm{span}}(D(S, h)) \setminus S\). Because \(G \cap S = \emptyset \), it is enough to show \(G \subseteq {\textrm{span}}(D(S, h))\). Given an arbitrary \(e \in G\), by construction we have \(W \cup \{e\} \notin \mathcal {I}_{h}\), i.e., \(r_{h}(W \cup \{e\}) = r_{h}(W)\). As \(W \subseteq S\), this yields \(r_{h}(S \cup \{e\}) = r_{h}(S)\), which then implies \(e \in {\textrm{span}}(D(S, h))\). The latter implication follows by a standard matroid argument; we provide a proof in Lemma A.1 for completeness.

As \(G\subseteq {\textrm{span}}(D(S,h))\setminus S\), and G and C have the same distribution, we get

$$\begin{aligned} \Pr \left[ |{\textrm{span}}(D(S,h)\setminus S)| \le \tfrac{|N|}{12}\right] \le \Pr \left[ |G| \le \tfrac{|N|}{12}\right] = \Pr \left[ |C| \le \tfrac{|N|}{12}\right] . \end{aligned}$$
(7)

Moreover,

$$\begin{aligned} |C| = |S| - |W| \ge |S| - h r(N) \ge |S| - {|N|}/{3}, \end{aligned}$$
(8)

where the equality follows from \(S = C \uplus W\), the first inequality from \(W \in \mathcal {I}_{h}\) (which implies \(|W| = r_{h}(W) \le h r(N)\)), and the last one from the fact that \(\mathcal {M}\) contains 3h many disjoint bases (and hence \(|N| \ge 3 h r(N)\)).

Combining (7) and (8) we obtain

$$\begin{aligned} \Pr \left[ |{\textrm{span}}(D(S,h)\setminus S)|\le \tfrac{|N|}{12}\right] \le \Pr \left[ |S| - \tfrac{|N|}{3} \le \tfrac{|N|}{12}\right] \le \Pr \left[ |S| \le \tfrac{5}{6} \mathbb {E}[|S|]\right] , \end{aligned}$$
(9)

where the second inequality follows from \(\mathbb {E}[|S|]={|N|}/{2}\). Relation (5) now follows by applying a Chernoff bound \(\Pr [X \le (1 - \delta )\mathbb {E}[X]] < \texttt{exp}[{-\delta ^{2} \mathbb {E}[X]}/{2}]\) for \(X = |S|\) to the right-hand side expression in (9) and using \(\mathbb {E}[|S|] = {|N|}/{2}\).

We next prove the concentration result (6). Let B be a union of 3h disjoint bases contained in \(\mathcal {M}\). We first show that for any set \(A \subseteq B\) the following inequality holds:

$$\begin{aligned} r(D(A, h)) \ge \frac{|A| - h r(A)}{2h}. \end{aligned}$$
(10)

To this end, we start by observing that, because \(\left. \mathcal {M}\right| _{B}\) is a uniformly dense matroid with density 3h, we have

$$\begin{aligned} 3h r(Q) \ge |Q| \qquad \forall Q\subseteq B. \end{aligned}$$
(11)

The above property also immediately follows by observing that, if we write \(B=B_1 \cup \cdots \cup B_{3h}\) as the union of 3h disjoint bases, then we have

$$\begin{aligned} |Q| = \sum _{i=1}^{3h} |Q \cap B_i| = \sum _{i=1}^{3h} r(Q\cap B_i) \le \sum _{i=1}^{3h} r(Q) = 3h r(Q). \end{aligned}$$

Relation (10) now holds due to the following:

$$\begin{aligned} 2h r(D(A,h)) \ge |D(A,h)| - h r(D(A,h)) \ge |A| - h r(A), \end{aligned}$$

where the first inequality is due to (11) and the second one follows from the definition of D(Ah)).

Now, in order to show (6), suppose that the event \(r(D(S, h)) \le {r(N)}/{8}\) occurs. Then

$$ \frac{r(N)}{8} \ge r(D(S, h)) \ge r(D(S \cap B, h)) \ge \frac{|S \cap B| - h r(S \cap B)}{2\,h} \ge \frac{|S \cap B| - h r(N)}{2\,h}, $$

where the third inequality holds by (10). Since rearranging the terms in the above expression gives \(|S \cap B| \le {5\,h r(N)}/{4}\), we have \(\Pr \left[ r(D(S, h)) \le {r(N)}/{8}\right] \le \Pr \left[ |S \cap B| \le {5\,h r(N)}/{4}\right] \). To upper bound the latter probability, observe that \(S \cap B\) contains each element of B with probability \({1}/{2}\) independently by construction of S. Therefore we can apply a Chernoff bound \(\Pr \left[ X \le (1 - \delta ) \mathbb {E}[X]\right] \le \exp \left( -{\delta ^{2}\mathbb {E}[X]}/{2}\right) \) with \(X = |S \cap B|\), \(\mathbb {E}[X] = {|B|}/{2} = {3\,h r(N)}/{2}\), and \(\delta = {1}/{6}\) (chosen so that \((1 - \delta ) \mathbb {E}[X] = {5\,h r(N)}/{4}\)), resulting in

$$\begin{aligned} \Pr \left[ r(D(S, h)) \le \frac{r(N)}{8}\right]&\le \texttt{exp} \left( -\frac{h r(N)}{48} \right) \le \texttt{exp} \left( -\frac{r(N)}{48}\right) , \end{aligned}$$

where the last inequality holds since \(h \ge 1\). \(\square \)

The proof of Theorem 3.6 is based on the concentration result (6). In summary, rather than directly showing that \(\rho _{\left. \mathcal {M}\right| _{S}}\) approximates \(\rho _{\mathcal {M}}\) well everywhere, we consider a discrete set of points on \(\rho _{\mathcal {M}}\) associated to minors of \(\mathcal {M}\) of geometrically increasing ranks. We then apply (6) to these minors and employ a union bound to show that we get a good approximation for these grid points. The union bound works out because the ranks are geometrically increasing and appear in the exponent of the right-hand side of (6). The complete proof is presented below.

Proof of Theorem 3.6

Let \(\{\lambda _i\}_{i \in [m]}\) denote the densities (i.e., values of at least one) in the image of \(\rho _{\mathcal {M}}\), and let \(\tau {:=}\max \{t>0 :\rho _{\mathcal {M}}(t) \ge 1\}\). Let \(\tilde{\rho }:\mathbb {R}_{>0} \rightarrow \mathbb {R}_{\ge 0}\) be the curve obtained from \(\rho _{\mathcal {M}}\) by rounding down every density \(\lambda _i\) to the closest power of 3. That is,

$$\begin{aligned} \tilde{\rho }(t) = {\left\{ \begin{array}{ll} \max \{3^j :3^j \le \rho _{\mathcal {M}}(t), j \in \mathbb {Z}_{\ge 0}\} & {\text {if }} 0 < t \le \tau ,\\ 0 & {\text {if }} t > \tau . \end{array}\right. } \end{aligned}$$

Let \(\tilde{\lambda }_1> \cdots >\tilde{\lambda }_{\ell }\) denote the densities in the image of \(\tilde{\rho }\), and note that by construction all the \(\tilde{\lambda }_i\) are powers of 3. In particular, \(\tilde{\rho }\) is a (1, 3)-approximation of \(\rho _{\mathcal {M}}\).

Next, let \(r_{\max } :\mathbb {R}_{\ge 0} \rightarrow \mathbb {R}_{\ge 0}\) be the function given by

$$\begin{aligned} r_{\max }(\lambda ) = {\left\{ \begin{array}{ll} \max \{t: \tilde{\rho }(t) \ge \lambda \} & {\text {if }} 0 \le \lambda \le \tilde{\lambda }_1,\\ 0 & {\text {if }} \lambda > \tilde{\lambda }_1. \end{array}\right. } \end{aligned}$$

We define a subset of densities of \(\{\tilde{\lambda }_i\}_{i \in [\ell ]}\) as follows. First, set \(\mu _1=\tilde{\rho }(36)\) and \(i=1\). Then, while \(\tilde{\rho }(36 r_{\max }(\mu _i)) \ge 1\) (i.e., \(\tilde{\rho }(36 r_{\max }(\mu _i))\) is in the non-zero support of \(\tilde{\rho }\)), set \(\mu _{i+1} = \tilde{\rho }(36 r_{\max }(\mu _i))\) and update \(i=i+1\). Let \(\Lambda \) denote the subset of densities selected by the above procedure and let \(q{:=}|\Lambda |\).

Now, for each \(i \in [q]\), define \(r_i {:=}r(D(N,\mu _i))\). Note that \(r_i = r_{\max }(\mu _i)\). We then call a subset \(S \subseteq N\) good if it satisfies

$$\begin{aligned} r\left( D\left( S, \frac{\mu _i}{3}\right) \right) \ge \frac{r_i}{8} \qquad \forall i \in [q]. \end{aligned}$$
(12)

The motivation for the above definition is that any good set S satisfies that \(\rho _{\left. \mathcal {M}\right| _{S}}\) is a (288, 9)-approximation of \(\rho _{\mathcal {M}}\). To see this, first note that if S is good then \(\rho _{\left. \mathcal {M}\right| _{S}} (t) \ge {\mu _i}/{3}\) for all \(t\in (0,{r_i}/{8}]\). Next, let \(\mu _{i-1},\mu _i \in \Lambda \) and \(t\in ({r_{i-1}}/{8},{r_{i}}/{8}]\). Then \(288t > 36r_{i-1}\), and thus \(\tilde{\rho }(288t) \le \tilde{\rho }(36r_{i-1})=\mu _i\), where the last equality follows by construction of the \(\mu _i\)’s. Using that \(\tilde{\rho }\) is a (1, 3)-approximation of \(\rho _{\mathcal {M}}\), it follows that \(\rho _{\mathcal {M}}(288t)\le 3 \tilde{\rho }(288t) \le 3 \mu _i\). Combining all the above, it follows that

$$\begin{aligned} \rho _{\left. \mathcal {M}\right| _{S}} (t) \ge \frac{\mu _i}{3} \ge \frac{\rho _{\mathcal {M}}(288t)}{9} \qquad \forall \mu _{i-1},\mu _i \in \Lambda {\text { and }} t\in \left( \frac{r_{i-1}}{8},\frac{r_{i}}{8}\right] . \end{aligned}$$

Moreover, notice that for \(t\in (1,{r_1}/{8}]\) we have \(\tilde{\rho }(288t) \le \tilde{\rho }(36) = \mu _1\). Hence, using the same reasoning as above, it follows that \(\rho _{\left. \mathcal {M}\right| _{S}} (t) \ge {\mu _1}/{3} \ge {\rho _{\mathcal {M}}(288t)}/{9}\). Finally, consider the case where \(t > {r_q}/{8}\). By construction of the \(\mu _i\) we have \(\tilde{\rho }(288t) \le \tilde{\rho }(36r_q) = 0\). Hence \(\rho _{\mathcal {M}}(288t)=\tilde{\rho }(288t)=0\), since \(\tilde{\rho }\) is a (1, 3)-approximation of \(\rho _{\mathcal {M}}\) and thus \(\tilde{\rho }(a)=0 \iff \rho _{\mathcal {M}}(a)=0\) for any \(a>0\). It follows that \(\rho _{\left. \mathcal {M}\right| _{S}} (t) \ge 0 = \rho _{\mathcal {M}}(288t)\). This concludes the proof of the claim that if S is good, then \(\rho _{\left. \mathcal {M}\right| _{S}}\) is a (288, 9)-approximation of \(\rho _{\mathcal {M}}\).

Hence, in order to prove the theorem it remains to show that the probability of a set \(S\subseteq N\) being good (i.e., satisfying (12)) is at least \({1}/{100}\). We discuss this next. First, note that by Eq. (6) from Theorem 5.1, for each \(\mu _i \in \Lambda \) with \(\mu _i \ge 3\) it holds

$$\begin{aligned} \Pr \left[ r\left( D\left( S, \frac{\mu _i}{3}\right) \right) \le \frac{r_i}{8}\right] \le \texttt{exp}\left( -\frac{r_i}{48}\right) . \end{aligned}$$

In the case where \(\mu _q=1\), while we cannot directly black box the concentration bound from Theorem 5.1 since the assumptions are not met, we can still get the same bound as follows. Note that if \(\mu _q=1\), then \(r(D(S, {\mu _q}/{3}))\) is just r(S), and \(r_q = r(N)\). Let B be any basis of \(\mathcal {M}\), and observe that \(r(S) \ge |S \cap B|\). Since S contains every element of N independently with probability \({1}/{2}\), we can use a Chernoff bound \(\Pr \left[ X \le (1 - \delta ) \mathbb {E}[X]\right] \le \exp \left( -{\delta ^{2}\mathbb {E}[X]}/{2}\right) \) with \(X = |S \cap B|\), \(\mathbb {E}[X] = {|B|}/{2} = {r(N)}/{2}\), and \(\delta = {3}/{4}\) (chosen so that \((1 - \delta ) \mathbb {E}[X] = {r(N)}/{8}\)), resulting in

$$\begin{aligned} \Pr \left[ r\left( D\left( S, \frac{\mu _q}{3}\right) \right) \!\le \! \frac{r_q}{8}\right] \!=\! \Pr \left[ r(S) \le \frac{r(N)}{8}\right] \!\le \! \texttt{exp}\left( -\!\frac{3^2 \cdot r_q}{4^3}\right) \le \texttt{exp}\left( -\!\frac{r_q}{48}\right) . \end{aligned}$$

Finally, note that by construction we have \(r_1\ge 36\) and \(r_{i+1} \ge 36 r_i\) for all \(i,i+1 \in [q]\). Thus, \(r_i\ge 36^i\) for each \(i \in [q]\), and hence by the union bound it follows

$$\begin{aligned} \Pr [S {\text { not good}}]&\le \sum _{i=1}^q \texttt{exp}\left( -\frac{r_i}{48}\right) \le \sum _{i=1}^q \texttt{exp}\left( -\frac{36^i}{48}\right) \\ &\le \left[ \texttt{exp}\left( -\frac{36}{48}\right) +2\texttt{exp}\left( -\frac{36^2}{48}\right) \right] \le \frac{99}{200}. \end{aligned}$$

Hence,

$$\begin{aligned} \Pr [S {\text { and }} N\setminus S {\text { are both good}}] \ge 1 - 2 \cdot \frac{99}{200} = \frac{1}{100}, \end{aligned}$$

which concludes the proof. \(\square \)

6 The main algorithm and its analysis

In this section we describe and analyse the procedure from Theorem 3.5. The analysis consists of two main ingredients. The first one is to show that if the approximate curve \(\tilde{\rho }\) is well-structured (in some well-defined sense), then there is an algorithm retrieving a constant factor of \(F(\tilde{\rho })\) on expectation — see Theorem 6.1. The second one is then to show that, given any initial approximate curve \(\tilde{\rho }\), one can find well-structured curves whose F function value is close to \(F(\tilde{\rho })\) — see Theorem 6.2.

The next result, proved in Sect. 6.1, formalizes the first step above.

Theorem 6.1

Let \(\mathcal {M}=(N,\mathcal {I})\) be a matroid with w-sampled weights, and let r and \(\rho _{\mathcal {M}}\) denote the rank function and rank-density curve of \(\mathcal {M}\), respectively. Let \(\overline{\rho }\le \rho _{\mathcal {M}}\) be a rank-density curve with densities \(\{\overline{\lambda }_{i}\}_{i \in [m]}\) such that the \(\overline{\lambda }_{i}\) are powers of some integer \(\beta \ge 3\) and \(\overline{\lambda }_{1}> \cdots > \overline{\lambda }_{m} \ge 1\), and such that \(r(D(N, \overline{\lambda }_{i + 1})) \ge 24 r(D(N, {\overline{\lambda }_{i}}/{\beta }))\) for \(i \in [m - 1]\). Then there is an efficient procedure \({\textrm{ALG}}(\overline{\rho }, \beta )\) that, when run on the RA-MSP subinstance given by \(\mathcal {M}\), returns an independent set I of \(\mathcal {M}\) of expected weight at least \(({1}/{180e}) F(\overline{\rho })\).

We note that the above result assumes the \(\overline{\lambda }_{i}\) to be powers of \(\beta \) mainly for convenience (so that \({\overline{\lambda }_{i}}/{\beta }\) is an integer), but it is not strictly needed.

The second main ingredient in the proof of Theorem 3.5 is the following result.

Theorem 6.2

Let \(\mathcal {M}=(N,\mathcal {I})\) be a matroid with w-sampled weights, and let r and \(\rho _{\mathcal {M}}\) denote the rank function and rank-density curve of \(\mathcal {M}\), respectively. Given an \((\alpha , \beta )\)-approximate curve \(\tilde{\rho }\) of \(\rho _{\mathcal {M}}\) with \(\alpha \in \mathbb {R}_{\ge 24}\) and \(\beta \in \mathbb {Z}_{\ge 3}\), there is a procedure \({\textrm{ALG}}(\tilde{\rho }, \alpha , \beta )\) returning rank-density curves \(\overline{\rho }, \overline{\rho }_{1}, \overline{\rho }_{2}, \overline{\rho }_{3}, \overline{\rho }_{4}\) such that:

  1. (i)

    \(\overline{\rho }\) is an \((\alpha ^2, \beta ^2)\)-approximation of \(\rho _{\mathcal {M}}\).

  2. (ii)

    \(\sum _{i \in [4]} F(\overline{\rho }_{i}) \ge F(\overline{\rho })\).

  3. (iii)

    For each \(i\in [4]\), \(\overline{\rho }_{i}\) satisfies the following properties: Let \(\{\mu _j\}_{j \in [\ell ]}\) be the densities of \(\overline{\rho }_{i}\), then all the \(\mu _j\) are powers of \(\beta \ge 3\), and \(r(D(N, \mu _{j + 1})) \ge \alpha r(D(N, {\mu _{j}}/{\beta })) \ge 24 r(D(N, {\mu _{j}}/{\beta }))\) for \(j \in [\ell - 1]\). Moreover, \(\overline{\rho }_{i} \le \rho _{\mathcal {M}}\).

Proof

We first discuss how to build the curve \(\overline{\rho }\). The goal is, on the one hand, to guarantee that the image \(\{\overline{\lambda }_i\}_{i \in [\overline{m}]}\) of the curve \(\overline{\rho }\) satisfies that each \(\overline{\lambda }_i\) is a power of \(\beta \), and moreover, that the ranks corresponding to any two consecutive densities \(\overline{\lambda }_i\) and \(\overline{\lambda }_{i+1}\) are at least an \(\alpha \) factor apart (we formalize this below). On the other hand, we want \(\overline{\rho }\) to be as close as possible to \(\tilde{\rho }\), while satisfying \(\overline{\rho }\le \tilde{\rho }\). We next discuss how to achieve these.

Let \(\{\tilde{\lambda }_i\}_{i \in [\tilde{m}]}\) denote the densities (i.e., values of at least one) in the image of \(\tilde{\rho }\), and let \(\tau {:=}\max \{t>0 :\tilde{\rho }(t) \ge 1\}\). Let \(\rho ':\mathbb {R}_{>0} \rightarrow \mathbb {R}_{\ge 0}\) be the curve obtained from \(\tilde{\rho }\) by rounding down every density \(\tilde{\lambda }_i\) to the closest power of \(\beta \). That is,

$$\begin{aligned} \rho '(t) {:=}{\left\{ \begin{array}{ll} \max \{\beta ^j :\beta ^j \le \tilde{\rho }(t), j \in \mathbb {Z}_{\ge 0}\} & {\text {if }} 0 < t \le \tau ,\\ 0 & {\text {if }} t > \tau . \end{array}\right. } \end{aligned}$$

Let \(\{\lambda '_i\}_{i \in [m']}\) denote the densities in the image of \(\rho '\), and note that by construction all the \(\lambda '_i\) are powers of \(\beta \). It thus remains to guarantee that the geometric rank increase property for any two consecutive densities is satisfied. In order to achieve this, we use the following definition. Given a rank-density curve \(\rho \) with image \(\lambda _1> \cdots > \lambda _m\), we define the function \(r^{\rho }_{\max }:\mathbb {R}_{\ge 0} \rightarrow \mathbb {R}_{\ge 0}\) by

$$\begin{aligned} r^{\rho }_{\max }(\lambda ) {:=}{\left\{ \begin{array}{ll} \max \{t: \rho (t) \ge \lambda \} & {\text {if }} \lambda \le \lambda _1,\\ 0 & {\text {otherwise}}. \end{array}\right. } \end{aligned}$$

For brevity, let \(r'_{\max }{:=}r^{\rho '}_{\max }\) denote the \(r_{\max }\) function corresponding to the curve \(\rho '\).

We then build \(\overline{\rho }\) as follows. First, set \(\overline{\lambda }_1 = \lambda '_1\) and \(i=1\). Then, while \(\rho '(\alpha \cdot r'_{\max } (\overline{\lambda }_i)) \ge 1\), set \(\overline{\lambda }_{i+1} = \rho '(\alpha \cdot r'_{\max } (\overline{\lambda }_i))\) and update \(i=i+1\). Let \(\overline{\Lambda }{:=}\{\overline{\lambda }_i\}_{i \in [\overline{m}]}\) denote the densities selected by the above procedure. We define \(\overline{\rho }\) to be the curve obtained from \(\rho '\) by further rounding down densities \(\lambda '_i\) in the image of \(\rho '\) to the closest \(\overline{\lambda }\in \overline{\Lambda }\). More precisely,

$$\begin{aligned} \overline{\rho }(t) = {\left\{ \begin{array}{ll} \max \{ \overline{\lambda }\in \overline{\Lambda }: \overline{\lambda }\le \rho '(t)\} & {\text {if }} 0 < t \le \overline{\tau },\\ 0 & {\text {if }} t > \overline{\tau }, \end{array}\right. } \end{aligned}$$

where \(\overline{\tau } {:=}\overline{r}_{\max }(\overline{\lambda }_{\overline{m}}) \le \tau \).

We have that \(\overline{\rho }\) is an \((\alpha ,\beta )\)-approximation of \(\tilde{\rho }\) by construction due to Lemma 4.3, because \(\rho '\) is a \((1,\beta )\)-approximation of \(\tilde{\rho }\), and \(\overline{\rho }\) is an \((\alpha ,1)\)-approximation of \(\rho '\). Combining this with the fact that \(\tilde{\rho }\) is an \((\alpha ,\beta )\)-approximation of \(\rho _{\mathcal {M}}\) proves property (i).

Next, let \(\tilde{r}_{\max }{:=}r^{\tilde{\rho }}_{\max }\) and \(\overline{r}_{\max }{:=}r_{\max }^{\overline{\rho }}\) denote the \(r_{\max }\) functions corresponding to the curves \(\tilde{\rho }\) and \(\overline{\rho }\), respectively. We show the following.

Claim 6.3

If \(|\overline{\Lambda }|\ge 5\), then for any five consecutive densities \(\overline{\lambda }_{i}, \ldots , \overline{\lambda }_{i+4} \in \overline{\Lambda }\), we have

$$\begin{aligned} r\left( D\left( N,\frac{\overline{\lambda }_i}{\beta }\right) \right) \le \alpha \cdot r'_{\max } \left( \overline{\lambda }_{i+2}\right) . \end{aligned}$$
(13)

Let \(\kappa {:=}r(D(N,{\overline{\lambda }_i}/{\beta }))\). Note that if \(\kappa \le \alpha \), then Claim 6.3 holds because \(r'_{\max }(\overline{\lambda }_{i+2})\ge 1\) by construction. Now assume \(\kappa > \alpha \). We claim that in this case we have

$$\begin{aligned} \kappa \le \alpha \cdot \tilde{r}_{\max } \left( \frac{\overline{\lambda }_i}{\beta ^2}\right) , \end{aligned}$$
(14)

which holds due to the following. Because \(\tilde{\rho }\) is an \((\alpha ,\beta )\)-approximation of \(\rho \), we have

$$\begin{aligned} \tilde{\rho }(t) \ge \frac{\rho (\alpha t)}{\beta } \qquad \forall t\ge 1. \end{aligned}$$

Applying this inequality with \(t={\kappa }/{\alpha }\), we get

$$\begin{aligned} \tilde{\rho }\left( \frac{\kappa }{\alpha }\right) \ge \frac{\rho (\kappa )}{\beta } \ge \frac{\overline{\lambda }_i}{\beta ^2}, \end{aligned}$$
(15)

where the second inequality follows from \(\kappa {:=}r(D({N,\overline{\lambda }_i}/{\beta }))\). Finally, Eq. (14) is an immediate implication of Eq. (15). The claim now follows due to

$$\begin{aligned} \kappa \le \alpha \cdot \tilde{r}_{\max } \left( \frac{\overline{\lambda }_i}{\beta ^2}\right) =\alpha \cdot r'_{\max } \left( \frac{\overline{\lambda }_i}{\beta ^2}\right) \le \alpha \cdot r'_{\max } \left( \overline{\lambda }_{i+2}\right) , \end{aligned}$$

where the first inequality is due to (14), the first equality follows by construction of \(\rho '\) and the fact that \({\overline{\lambda }_i}/{\beta ^2}\) is a power of \(\beta \) (because \(\overline{\lambda }_i\) is a power of \(\beta \) and \(\overline{\lambda }_i \ge \beta ^4 \overline{\lambda }_{i+4} \ge \beta ^4\)), and the second inequality again follows since by construction \({\overline{\lambda }_i}/{\beta ^2} \ge \overline{\lambda }_{i+2}\).

Using Claim 6.3, we can further upper bound \(r(D(N,\frac{\overline{\lambda }_i}{\beta }))\) as follows

$$\begin{aligned} r\left( D\left( N,\frac{\overline{\lambda }_i}{\beta }\right) \right)&\le \alpha \cdot r'_{\max } \left( \overline{\lambda }_{i+2}\right) = \alpha \cdot \overline{r}_{\max } (\overline{\lambda }_{i+2}) \nonumber \\&\le \frac{1}{\alpha } \cdot \overline{r}_{\max } (\overline{\lambda }_{i+4}) \le \frac{1}{\alpha } \cdot r( D(N,\overline{\lambda }_{i+4}) ), \end{aligned}$$
(16)

where the first inequality is due to Claim 6.3, the equality holds because \(r'_{\max } (\overline{\lambda }) = \overline{r}_{\max } (\overline{\lambda })\) for every \(\overline{\lambda }\in \overline{\Lambda }\) by construction, the second inequality holds because by construction \(\overline{r}_{\max } (\overline{\lambda }_{j+1}) \ge \alpha \cdot \overline{r}_{\max } (\overline{\lambda }_{j})\) for any two consecutive densities \(\overline{\lambda }_{j}, \overline{\lambda }_{j+1} \in \overline{\Lambda }\), and the last inequality follows from \(\overline{\rho }\le \rho \).

We now build the curves \(\{\overline{\rho }_i\}_{i \in [4]}\) as follows. For \(i\in [4]\), let

$$\begin{aligned} \overline{\Lambda }_i {:=}\left\{ \overline{\lambda }_j \in \overline{\Lambda }: j \equiv i-1 \quad ({\text{ mod } } 4) \right\} . \end{aligned}$$
(17)

If \(\overline{\Lambda }_i = \emptyset \), let \(\overline{\rho }_i\) be (by convention) the curve given by \(\overline{\rho }_i (t) = 1\) for \(0 < t \le \overline{\tau }\), and \(\overline{\rho }_i (t)=0\) for \(t > \overline{\tau }\). Otherwise, we define \(\overline{\rho }_i\) to be the curve obtained from \(\overline{\rho }\) by further rounding down densities \(\overline{\lambda }\in \overline{\Lambda }\) to the closest density in \(\overline{\Lambda }_i\). More precisely,

$$\begin{aligned} \overline{\rho }_i (t) {:=}{\left\{ \begin{array}{ll} \max \{ \mu \in \overline{\Lambda }_i: \mu \le \overline{\rho }(t)\} & {\text {if }} 0 < t \le \overline{\tau }_i,\\ 0 & {\text {if }} t > \overline{\tau }_i, \end{array}\right. } \end{aligned}$$

where \(\overline{\tau }_i {:=}\overline{r}_{\max } (\min \{\mu : \mu \in \overline{\Lambda }_i\})\).

To see property (ii), note that by construction we have \(\cup _{i \in [4]} \overline{\Lambda }_{i} = \overline{\Lambda }\). By setting \(\overline{r}_{\max } (\overline{\lambda }_{i}) = 0\) for all \(i \le 0\) by convention, we get

$$\begin{aligned} \sum _{i = 1}^{4} F(\overline{\rho }_{i})&= \sum _{i = 1}^{\overline{m}} (\overline{r}_{\max }(\overline{\lambda }_{i}) - \overline{r}_{\max }(\overline{\lambda }_{i - 4})) \eta (\overline{\lambda }_{i}) \\ &\ge \sum _{i = 1}^{\overline{m}} (\overline{r}_{\max }(\overline{\lambda }_{i}) - \overline{r}_{\max }(\overline{\lambda }_{i - 1})) \eta (\overline{\lambda }_{i}) = F(\overline{\rho }). \end{aligned}$$

Finally, to see property (iii), first note that by construction we have \(\overline{\rho }_i \le \overline{\rho }\le \rho ' \le \rho _{\mathcal {M}}\) for each \(i \in [4]\), and moreover, all the curves \(\overline{\rho }_i\) consist of densities that are powers of \(\beta \). In addition, the geometric rank increase property holds trivially if \(|\overline{\Lambda }| \le 4\). Otherwise, it follows directly from Eq. (16) and the fact that \(\alpha \ge 24\). \(\square \)

We now show how Theorems 6.1 and 6.2 combined imply Theorem 3.5.

Proof of Theorem 3.5

Given an \((\alpha , \beta )\)-approximation \(\tilde{\rho }\) of \(\rho _{\mathcal {M}}\), first run the procedure from Theorem 6.2 to get curves \(\overline{\rho },\overline{\rho }_1,\overline{\rho }_2,\overline{\rho }_3,\overline{\rho }_4\). Then choose an index \(i\in [4]\) uniformly at random and run the procedure from Theorem 6.1 on \(\overline{\rho }_i\) to get an independent set with expected weight at least

$$\begin{aligned} \frac{1}{180e} \left( \frac{1}{4} \sum _{i=1}^4 F(\overline{\rho }_{i}) \right) \ge \frac{1}{720e} F(\overline{\rho }) \ge \frac{1}{1440 e \alpha ^2 \beta ^2} \left( F(\rho _{\mathcal {M}}) - \alpha ^2 w_{\max } \right) , \end{aligned}$$

where the last inequality uses Lemma 3.4 and the fact that \(\overline{\rho }\) is an \((\alpha ^{2}, \beta ^{2})\)-approximation of \(\rho _{\mathcal {M}}\). \(\square \)

Thus, to show Theorem 3.5, it remains to prove Theorem 6.1.

6.1 Proof of Theorem 6.1

Throughout this section we use the notation and assumptions from Theorem 6.1.

We prove the theorem in two steps. First, we show that after observing a sample set S, we can build a chain \(\bigoplus _{i = 1}^{m} \mathcal {M}_{i}\) of \(\left. \mathcal {M}\right| _{N {\setminus } S}\) satisfying certain properties with at least constant probability. Then we show that, given such a chain, there is a procedure returning an independent set I of \(\mathcal {M}\) with \(\mathbb {E}[w(I)] = \Omega (F(\overline{\rho }))\), leading to the desired result. We start by discussing the former claim.

Given a sample set \(S \subseteq N\), we build a chain of matroids as follows. For \(i \in [m]\) let

$$\begin{aligned} \begin{aligned} N_{i}&{:=}{\textrm{span}}\left( D\left( S, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \setminus \left( S \cup {\textrm{span}}\left( D\left( S, \frac{\overline{\lambda }_{i - 1}}{\beta }\right) \right) \right) {\text{, } \text{ and } }\\ \mathcal {M}_{i}&{:=}\left. \left( \mathcal {M} / {\textrm{span}}\left( D\left( S, \frac{\overline{\lambda }_{i - 1}}{\beta }\right) \right) \right) \right| _{N_{i}}, \end{aligned} \end{aligned}$$
(18)

where \(D(S, {\overline{\lambda }_{0}}/{\beta }) = \emptyset \) by convention.

In addition, for every \(i \in [m]\) let \(\overline{N}_{i} {:=}D(N, \overline{\lambda }_{i})\), and define \(\overline{\Lambda }{:=}\{i \in [m] :r(\overline{N}_{i}) \ge 24, \ \overline{\lambda }_{i} \ge \beta \}\). Note that \(\overline{\Lambda }\) and the \(\overline{N}_{i}\)’s do not depend on the sample set S. Moreover, from the assumptions of Theorem 6.1 it follows that \(\overline{\Lambda }\supseteq [m] {\setminus } \{1,m\}\). The next result shows that with constant probability, the sample set S is such that for each \(i \in \overline{\Lambda }\), the set \(N_i\) contains a subset \(U_i\) of large rank and density; more precisely, \(r(U_i) \ge \Omega (r(\overline{N}_i))\) and \({|U_i|}/{r(U_i)} \ge \Omega (\overline{\lambda }_i)\).

Lemma 6.4

Let \(S \sim B(N, {1}/{2})\), and let \(N_i\), \(\overline{N}_i\), and \(\overline{\Lambda }\) be as defined above. Then, with probability at least \({1}/{3}\), every \(N_{i}\) with \(i \in \overline{\Lambda }\) contains \(\overline{\lambda }_{i}\) disjoint independent sets \(I_{1}, \ldots , I_{\overline{\lambda }_{i}}\) such that \(\sum _{j=1}^{\overline{\lambda }_{i}} |I_{j}| \ge ({1}/{24}) \overline{\lambda }_{i} r(\overline{N}_{i})\).

Proof

Let \(B_{i} \subseteq \overline{N}_{i}\) denote the union of (any) \(\overline{\lambda }_{i}\) disjoint bases of \(\left. \mathcal {M}\right| _{\overline{N}_{i}}\). We then say that a sample set S is good if it satisfies

$$\begin{aligned} \left| \left( {\textrm{span}}\left( D\left( S \cap B_{i}, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \cap B_{i}\right) \setminus S\right| \ge \frac{1}{12} \cdot |B_{i}| \qquad \forall i \in \overline{\Lambda }. \end{aligned}$$
(19)

The motivation for the above definition is that any good sample set S leads to a matroid chain (as defined in Eq. (18)) that satisfies the properties claimed in the lemma. To see this, note that for every \(i \in \overline{\Lambda }\cap [m - 1]\) we have

$$\begin{aligned} \left| {\textrm{span}}\left( D\left( S, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \cap B_{i + 1}\right|&\le \left| {\textrm{span}}\left( D \left( N, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \cap B_{i + 1}\right| \\&\le \overline{\lambda }_{i + 1} r\left( D\left( N, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \\&\le \frac{\overline{\lambda }_{i + 1}}{24} r(\overline{N}_{i + 1}), \end{aligned}$$

where the last inequality holds since \(r(D(N, \overline{\lambda }_{i + 1})) \ge 24 r(D(N, {\overline{\lambda }_{i}}/{\beta }))\) by the assumptions of Theorem 6.1. Moreover, since \(D(S, {\overline{\lambda }_{0}}/{\beta }) = \emptyset \) by convention, the same bound holds for \(i = 0\), i.e., \(|{\textrm{span}}\left( D\left( S, {\overline{\lambda }_{0}}/{\beta }\right) \right) \cap B_{1}| \le ({\overline{\lambda }_{1}}/{24}) r(\overline{N}_{1})\). Thus, for any \(i \in \overline{\Lambda }\) we get

$$\begin{aligned} |N_{i} \cap B_{i}|&\ge \left| \left( {\textrm{span}}\left( D\left( S \cap B_{i}, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \cap B_{i}\right) \setminus S\right| - \left| {\textrm{span}}\left( D\left( S, \frac{\overline{\lambda }_{i - 1}}{\beta }\right) \right) \cap B_{i}\right| \\&\ge \frac{1}{12} |B_{i}| - \frac{1}{24} \overline{\lambda }_{i} r(\overline{N}_{i})\\&= \frac{1}{24} \overline{\lambda }_{i} r(\overline{N}_{i}). \end{aligned}$$

Finally, since \(B_{i}\) is a union of \(\overline{\lambda }_{i}\) disjoint bases, then \(N_{i} \cap B_{i}\) is a union of \(\overline{\lambda }_{i}\) disjoint independent sets (contained in \(N_{i}\)), and hence the claim follows.

Thus, in order to prove the lemma, it is only left to show that \(\Pr \left[ S {\text { is good}}\right] \ge {1}/{3}\). To see this, first observe that since for all \(i\in \overline{\Lambda }\) it holds that \(\overline{\lambda }_{i} = \beta ^j\) for some \(j \ge 1\), and \(\beta \ge 3\), then

$$\begin{aligned} \frac{\overline{\lambda }_{i}}{\beta } = \left\lfloor \frac{\overline{\lambda }_{i}}{\beta }\right\rfloor \le \left\lfloor \frac{\overline{\lambda }_{i}}{3}\right\rfloor . \end{aligned}$$
(20)

Then, by applying (5) from Theorem 5.1 to every \(\left. \mathcal {M}\right| _{B_{i}}\) with \(i \in \overline{\Lambda }\), we can bound the probability of S not being good because of index \(i\in \overline{\Lambda }\) by

$$\begin{aligned} \Pr&\left[ \left| \left( {\textrm{span}}\left( D\left( S \cap B_{i}, \frac{\overline{\lambda }_{i}}{\beta }\right) \right) \cap B_{i}\right) \setminus S\right| \le \frac{1}{12} |B_{i}|\right] \\&\le \Pr \left[ \left| \left( {\textrm{span}}\left( D\left( S \cap B_{i}, \left\lfloor \frac{\overline{\lambda }_{i}}{3} \right\rfloor \right) \right) \cap B_{i}\right) \setminus S\right| \le \frac{1}{12} |B_{i}|\right] \\&\le \exp \left( -\frac{|B_{i}|}{144}\right) \\&\le \exp \left( -\frac{r(\overline{N}_{i})}{48}\right) , \end{aligned}$$

where the first inequality follows from Eq. (20), the second from Theorem 5.1 and the fact that for all \(i \in \overline{\Lambda }\) the matroid \(\left. \mathcal {M}\right| _{B_{i}}\) contains 3h disjoint bases with \(h=\lfloor {\overline{\lambda }_{i}}/{3} \rfloor \ge 1\), and the last inequality holds since \(|B_{i}| = \overline{\lambda }_{i} r(\overline{N}_{i})\) and \(\overline{\lambda }_{i} \ge \beta \ge 3\) for all \(i \in \overline{\Lambda }\).

We can now upper bound the probability of S not being good by using a union bound over \(i\in \overline{\Lambda }\) and \(r(\overline{N}_{i}) \ge 24\) for \(i\in \overline{\Lambda }\) as follows:

$$\begin{aligned} \sum _{i \in \overline{\Lambda }} \exp \left( -\frac{r(\overline{N}_{i})}{48}\right) \le \sum _{i=1}^m \exp \left( -\frac{24^i}{48}\right) \le \exp \left( -\frac{24}{48}\right) + 2 \exp \left( -\frac{24^2}{48}\right) \le \frac{2}{3}. \end{aligned}$$

Hence, \(\Pr \left[ S {\text { is good}}\right] \ge 1 - {2}/{3} = {1}/{3}\), as desired. \(\square \)

The second main ingredient in the proof is to show that the above result can be exploited algorithmically. More precisely, we prove the following.

Lemma 6.5

Let \(\mathcal {M}=(N,\mathcal {I})\) be a matroid with w-sampled weights that contains h disjoint independent sets \(I_{1}, \dots , I_{h}\) such that \(s {:=}({1}/{h}) \sum _{j = 1}^{h} |I_{j}| \ge 1\). Then there is a procedure that, when run on the RA-MSP subinstance given by \(\mathcal {M}\), and with only h given upfront, returns an independent set of \(\mathcal {M}\) with expected weight at least \(({s}/{2e}) \eta (h)\). This is still the case even if the elements of \(\mathcal {M}\) are revealed in adversarial (rather than uniformly random) order.

Proof

Suppose we run the online selection procedure (OSP) described in Algorithm 2 on the RA-MSP subinstance given by \(\mathcal {M}\), and with parameter h (as defined in the statement of Lemma 6.5) as input.

figure b

Suppose that OSP successfully completed its i-th iteration (where \(i \in \{0, \dots , r - 1\}\)) and is about to begin the \((i + 1)\)-st iteration. Let Z denote the set of elements seen so far and let I be the set of elements picked so far. Additionally, let \(T {:=}\bigcup _{j=1}^{h} I_{j}\) and \(J {:=}\{e \in N :I \cup \{e\} \in \mathcal {I}\}\). First, note that \(I \in \mathcal {I}\) by construction. Now, observe that

$$ |I_{j} \cap J| \ge \max \left\{ 0, |I_{j}| - |I|\right\} \ge |I_{j}| - i \qquad \forall j \in [h], $$

since \(I_{j}, I \in \mathcal {I}\) and \(|I| \le i\). Therefore \(|T \cap J| \ge (s - i)h\). Moreover, since the classical secretary algorithm was executed on i groups of h elements, we have \(|(T \cap Z) \cap J| \le ih\) and hence

$$ |(N {\setminus } Z) \cap J| \ge |(T {\setminus } Z) \cap J| = |T \cap J| - |(T \cap Z) \cap J| \ge (s - 2i) h. $$

Thus, if \(s \ge 2i + 1\), the number \(|(N\setminus Z) \cap J|\) of elements that Algorithm 2 can feed to the classical MSP algorithm in iteration \(i+1\) is at least h, which implies that it successfully completes its \(i + 1\) iteration; hence the total number of successful iterations is at least \(\lfloor {(s - 1)}/{2} \rfloor + 1 \ge \max \{1, \, s / 2\}\).

Finally, note that the expected weight of the element picked in each successful iteration is at least \(\eta (h) / e\). This follows since (by definition of \(\eta \)) the expected weight of the heaviest element in each successful iteration is \(\eta (h)\). Moreover, even though the arrival order is adversarial, due to the random assignment of the weights, the classical secretary algorithm is applied to h weights drawn uniformly at random from w. Since the classical secretary algorithm returns on expectation at least an e-fraction of the heaviest weight, the claimed factor of \(\eta (h) / e\) follows. Combining this with the lower bound on the number of successful iterations gives the desired result. \(\square \)

We can now combine Lemmas 6.4 and 6.5 to prove Theorem 6.1 as follows.

Proof of Theorem 6.1

Let \({\textrm{OSP}}(\mathcal {M},h)\) denote the online selection procedure from Lemma 6.5 (i.e., Algorithm 2). Additionally, for \(i\in [m]\), let \(r_{i}\) denote the coefficient of \(\eta (\overline{\lambda }_{i})\) in \(F(\overline{\rho })\). Hence, \(F(\overline{\rho }) = \sum _{i = 1}^{m} r_{i} \eta (\overline{\lambda }_{i})\). Consider the following algorithm: choose and execute one of the three branches presented below with probability \({12}/{15}\), \({2}/{15}\), and \({1}/{15}\), respectively.

  1. (i)

    Observe \(S \sim B(N, {1}/{2})\), construct the chain \(\bigoplus _{i = 1}^{m} \mathcal {M}_{i}\) as defined in (18), and run \({\textrm{OSP}}(\mathcal {M}_{i}, \overline{\lambda }_{i})\) for every \(i \in [m]\) (independently in parallel), returning all the picked elements.

  2. (ii)

    Run the classical secretary algorithm on \(\mathcal {M}\) and return the picked element (if any).

  3. (iii)

    Run \({\textrm{OSP}}(\mathcal {M}, 1)\) without observing anything and return all picked elements.

Suppose we execute branch (i). By Lemma 6.4, with probability at least \({1}/{3}\), every \(\mathcal {M}_{i}\) with \(i \in \overline{\Lambda }\) satisfies the conditions of Lemma 6.5 with parameters \(h = \overline{\lambda }_{i}\) and \(s = ({1}/{24}) r(\overline{N}_{i})\). Note that \(s \ge 1\) holds given that \(r(\overline{N}_{i}) \ge 24\) for all \(i \in \overline{\Lambda }\). As additionally all matroids in the chain form a direct sum, executing the first branch of the algorithm returns an independent set with expected weight, due to Lemma 6.5, of at least

$$ \frac{1}{3} \sum _{i \in \overline{\Lambda }} \frac{1}{2e} \cdot \frac{r(\overline{N}_{i})}{24} \eta (\overline{\lambda }_{i}) = \frac{1}{144e} \sum _{i \in \overline{\Lambda }} r(\overline{N}_{i}) \eta (\overline{\lambda }_{i}) \ge \frac{1}{144e} \sum _{i \in \overline{\Lambda }} r_i \eta (\overline{\lambda }_{i}), $$

where the inequality follows from \(\overline{\rho }\le \rho _{\mathcal {M}}\) and \(\overline{N}_{i} = D(N, \overline{\lambda }_{i})\) for every \(i \in [m]\).

Therefore, if \(i \in \overline{\Lambda }\), then the corresponding term \(r_{i} \eta (\overline{\lambda }_{i})\) in \(F(\overline{\rho })\) is accounted for by branch (i). Thus it only remains to consider \(i \in [m] {\setminus } \overline{\Lambda }\subseteq \{1,m\}\).

Assume first that \(1 \notin \overline{\Lambda }\). In this case, we must have \(r(\overline{N}_1)<24\). Since the expected weight yielded by running the classical secretary algorithm is at least \({\eta (|N|)}/{e}\), and \(\eta (|N|) \ge \eta (\overline{\lambda }_{1})\), then, by running branch (ii), the expected weight of the output set is at least

$$\begin{aligned} \frac{\eta (|N|)}{e} \ge \frac{1}{e} \cdot \frac{r(\overline{N}_1) \eta (\overline{\lambda }_{1})}{r(\overline{N}_1)} \ge \frac{1}{23e} r_1 \eta (\overline{\lambda }_{1}), \end{aligned}$$

where the last inequality follows from \(r_1 \le r(\overline{N}_1) \le 23\).

Finally, assume that \(m \notin \overline{\Lambda }\). Then \(\overline{\lambda }_{m} = 1\), in which case running branch (iii) yields

$$\begin{aligned} \mathbb {E}[w({\textrm{OSP}}(\mathcal {M}, 1))] \ge \frac{1}{2e} r(N) \eta (1) \ge \frac{1}{2e} r(\overline{N}_{m}) \eta (\overline{\lambda }_{m}), \end{aligned}$$

where the first inequality holds by Lemma 6.5 with \(h = 1\) and \(s = r(N) \ge 1\), as any basis of \(\mathcal {M}\) is an independent set of rank r(N), and the second inequality holds because \(r(N) \ge r(\overline{N}_{m})\) and \(\overline{\lambda }_{m} = 1\).

The desired lower bound on the expected weight of the set returned by the algorithm now follows by combining the above results with the respective probabilities that each branch is executed. \(\square \)

Finally, we discuss that our main result (i.e., Theorem 1.2) still holds in the more general adversarial order with a sample setting, where we are allowed to sample a set \(S \subseteq N\) containing every element of N independently with probability \({1}/{2}\), and the remaining (non-sampled) elements arrive in adversarial order.

In order to see this, first note that the only place in the proof of Theorem 6.1 where we use that the non-sampled elements (i.e., \(N\setminus S\)) arrive in random order, is to argue that when running the classical secretary algorithm in branch (ii) we obtain an expected weight of at least \({w_{\max }}/{e}\). Indeed, branches (i) and (iii) rely on running the procedure from Lemma 6.5, whose guarantees hold in the case where the elements arrive in adversarial order. However, note that running the classical secretary procedure in the above adversarial order with a sample setting outputs an element with expected weight of at least \({w_{\max }}/{4}\). Indeed, the probability of selecting \(w_{\max }\) in the latter setting is at least the probability of the event that \(w_{\max }\) is not sampled and the second largest weight is; which occurs with probability \({1}/{4}\). Thus, Theorem 6.1 holds (up to possibly a slightly worse constant) in the adversarial order with a sample setting.

Next, observe that this implies that Theorem 3.5 also holds in the above setting (again, up to possibly a slightly worse constant). This follows because its proof relies on combining the procedures from Theorems 6.1 and 6.2, and the latter is completely oblivious to the arrival order of the elements.

Finally, note that the proof of Theorem 1.2 uses the procedure from Theorem 6.1 and the classical secretary algorithm. Because (as discussed above) both of these algorithms have very similar guarantees in the adversarial order with a sample setting to the ones shown in this paper for random order, the claim follows.

6.2 Full algorithm and adversarial order with a sample setting

We next summarize the full algorithm used to obtain Theorem 1.2. First, the algorithm executes one of the following two branches uniformly at random:

  1. (1)

    Run the classical secretary algorithm on N.

  2. (2)

    Run the following procedure:

    • Sample a set S (without selecting anything) containing every element of N with probability \({1}/{2}\) independently.

    • Define \(\tilde{\rho }\) to be the (288, 9)-downshift of \(\rho _{\left. \mathcal {M}\right| _{S}}\).

    • Run the procedure of Theorem 3.5 on the remaining non-sampled elements (i.e., on the matroid \(\left. \mathcal {M}\right| _{N \setminus S}\)) using as input the curve \(\tilde{\rho }\), and parameters \(\alpha = 288^2\) and \(\beta = 9^2\).

The procedure from Theorem 3.5 used above consists of two main steps: First, it runs the algorithm from Theorem 6.2 on the curve \(\tilde{\rho }\) with parameters \(\alpha = 288^2\) and \(\beta = 9^2\), to find well-structured curves \(\overline{\rho }_1, \overline{\rho }_2, \overline{\rho }_3\), and \(\overline{\rho }_4\). Then, it selects one \(\overline{\rho }_j\) out of these four curves uniformly at random, and runs the procedure from Theorem 6.1 on the matroid \(\left. \mathcal {M}\right| _{N \setminus S}\) using as input such \(\overline{\rho }_j\) and \(\beta = 9^2\). The latter algorithm consists of executing one of the following three branches with probability \({2}/{15}\), \({1}/{15}\), and \({12}/{15}\) respectively:

  1. (2.i)

    Run the classical secretary algorithm on \(N \setminus S\).

  2. (2.ii)

    Run \({\textrm{OSP}}(\left. \mathcal {M}\right| _{N \setminus S}, 1)\).

  3. (2.iii)

    Sample a set \(S'\) (without selecting anything) containing every element of \(N \setminus S\) with probability \({1}/{2}\) independently, construct the chain \(\bigoplus _{i = 1}^{m} \mathcal {M}_{i}\) as defined in (18) using \(\overline{\rho }_j\) and \(\beta = 9^2\) as input, and run \({\textrm{OSP}}(\mathcal {M}_{i}, \overline{\lambda }_{i})\) for every \(i \in [m]\).

The above completes the description of the full algorithm. In summary, the algorithm executes one of the following four options: branch (1) with probability \({1}/{2}\), branch (2.i) with probability \({1}/{15}\), branch (2.ii) with probability \({1}/{30}\), and branch (2.iii) with probability \({2}/{5}\). Hence, either the classical secretary algorithm is executed (on branch (1) or 2(i)), or the \({\textrm{OSP}}\) procedure from Algorithm 2 is executed (on branch (2.ii) or (2.iii)). The guarantees of the latter hold even if the elements arrive in adversarial order — see Lemma 6.5. Moreover, because of the random assignment setting, the standard guarantees for the classical secretary algorithm still hold under adversarial arrival order. Indeed, since weights are assigned uniformly at random to elements after the arrival order has been fixed, this setting is equivalent to the one where weights are assigned to elements adversarially but the arrival order is uniformly at random.

We now discuss how to adapt the above procedure and its analysis to the adversarial order with a sample setting. In this setting, the algorithm must specify a (possibly random) sampling probability \(p \in [0,1]\). The instance is then revealed to the algorithm in two phases. First, a random set S containing every element of N with probability p independently is provided to the algorithm. However, the algorithm is not allowed to select any element from S. In the second phase, the elements of \(N \setminus S\) are then revealed in an adversarial order.

Our modified algorithm works as follows. First, it chooses one of the four options with the respective (i.e., \({1}/{2}\), \({1}/{15}\), \({1}/{30}\), and \({2}/{5}\)) probabilities. Then:

  • If branch (1) is chosen, the algorithm sets \(p={1}/{e}\) for the sampling phase to obtain a sample set S. It then uses this sample set S to choose a threshold, and picks the first element arriving in \(N \setminus S\) with weight above the threshold (if any). As discussed above, since for the classical secretary problem the random assignment adversarial order setting is equivalent to the standard adversarial assignment random order setting, this procedure outputs an element of expected weight at least \({w_{\max }}/{e}\). Thus being equivalent to running branch (1) in the original algorithm.

  • If branch (2.i) is chosen, the algorithm sets \(p={(e+1)}/{2e}\) for the sampling phase to obtain a sample set S. This is equivalent to sampling first over N with probability \({1}/{2}\) as done in (2), and then sampling another \({1}/{e}\)-fraction over the remaining elements as done in (2.i). Let \(S'\) be a random subset of S obtained by subsampling each element of S with probability \({1}/{(e+1)}\) independently. Note that \(S'\) is then a random set containing each element of N with probability \({1}/{2e}\) independently. We then use \(S'\) to set a threshold, and select the first element arriving in \(N {\setminus } S\) with weight above the threshold (if any). This procedure is equivalent to running branch (2.i) in the original algorithm.

  • If branch (2.ii) is chosen, the algorithm sets \(p={1}/{2}\) to obtain a sample set S, and then runs \({\textrm{OSP}}(\left. \mathcal {M}\right| _{N \setminus S}, 1)\) on the remaining non-sampled elements. This does not have any impact on the uniform assignment of the weights w to the elements, since it only depends on the sampled elements S but not the weights of its elements. Thus, the RA-MSP subinstance given by \(\left. \mathcal {M}\right| _{N\setminus S}\) on which we use OSP indeed assigns weights of w uniformly at random to elements, as required. Since the guarantees of OSP hold under adversarial arrival order, this is equivalent to running branch (2.ii) in the original algorithm.

  • Finally, if branch (2.iii) is chosen, the algorithm sets \(p={3}/{4}\) to obtain a sample set \(\overline{S}\). Let \(S'\) be a random subset of \(\overline{S}\) obtained by subsampling each element of \(\overline{S}\) with probability \({2}/{3}\) independently. Note that \(S'\) is then a random set containing each element of N with probability \({1}/{2}\) independently, while \(\tilde{S} {:=}\overline{S} \setminus S'\) is random set containing each element of N with probability \({1}/{4}\). We then use \(S'\) to simulate the sample set S used in branch (2) of the original algorithm, and \(\tilde{S}\) to simulate the sample set \(S'\) used in branch (2.iii) of the original algorithm. As discussed in the case above, this construction does not have any impact on the uniform assignment of the weights w to the elements. Thus, the RA-MSP subinstance given by \(\left. \mathcal {M}\right| _{N\setminus \overline{S}}\) on which we use OSP indeed assigns weights of w uniformly at random to elements, as required. Finally, using that the guarantees of OSP hold under adversarial arrival order, it follows that this procedure is equivalent to running branch (2.iii) in the original algorithm.