Improved lower bound for estimating the number of defective items

Bshouty, Nader H.

doi:10.1007/s10878-025-01264-9

Improved lower bound for estimating the number of defective items

Open access
Published: 20 February 2025

Volume 49, article number 33, (2025)
Cite this article

Download PDF

You have full access to this open access article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Improved lower bound for estimating the number of defective items

Download PDF

Nader H. Bshouty ORCID: orcid.org/0009-0007-7356-7824¹

312 Accesses
Explore all metrics

Abstract

Consider a set of items, X, with a total of n items, among which a subset, denoted as $I\subseteq X$, consists of defective items. In the context of group testing, a test is conducted on a subset of items Q, where $Q \subset X$. The result of this test is positive, yielding 1, if Q includes at least one defective item, that is if $Q \cap I \ne \emptyset $. It is negative, yielding 0, if no defective items are present in Q. We introduce a novel method for deriving lower bounds in the context of non-adaptive randomized group testing. For any given constant j, any non-adaptive randomized algorithm that, with probability at least 2/3, estimates the number of defective items |I| within a constant factor requires at least

$$\Omega \left( \dfrac{\log n}{\log \log {\mathop {\cdots }\limits ^{j}}\log n}\right) $$

tests. Our result almost matches the upper bound of $O(\log n)$ and addresses the open problem posed by Damaschke and Sheikh Muhammad in (Combinatorial Optimization and Applications - 4th International Conference, COCOA 2010, pp 117–130, 2010; Discrete Math Alg Appl 2(3):291–312, 2010). Furthermore, it enhances the previously established lower bound of $\Omega (\log n/\log \log n)$ by Ron and Tsur (ACM Trans Comput Theory 8(4): 15:1–15:19, 2016), and independently by Bshouty (30th International Symposium on Algorithms and Computation, ISAAC 2019, LIPIcs, vol 149, pp 2:1–2:9, 2019). For estimation within a non-constant factor $\alpha (n)$, we show: If a constant j exists such that $\alpha >{\log \log {\mathop {\cdots }\limits ^{j}}\log n}$, then any non-adaptive randomized algorithm that, with probability at least 2/3, estimates the number of defective items |I| to within a factor $\alpha $ requires at least

$$\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

In this case, the lower bound is tight.

Improved Lower Bound for Estimating the Number of Defective Items

On Detecting Some Defective Items in Group Testing

Optimal Deterministic Group Testing Algorithms to Estimate the Number of Defectives

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider a collection X comprising n items, with a subset $I \subseteq X$ representing the defective items. Within group testing, a test involves examining a subset $Q \subseteq X$, yielding a result of 1 if any defective item is present (i.e., $Q \cap I \ne \emptyset $), and 0 if no defective items are detected (i.e., $Q \cap I = \emptyset $).

Originally developed as an efficient method for large-scale blood testing (Dorfman 1943), the scope of group testing has expanded significantly to encompass a wide array of applications. These applications range from DNA library screening (Ngo and Du 1999) and quality control in manufacturing (Sobel and Groll 1959), file searching in storage systems (Kautz and Singleton 1964), sequential screening in experimental variables (Li 1962), efficient contention resolution algorithms for multiple-access com- munication (Kautz and Singleton 1964; Wolf 1985), enhancing data compression techniques (Hong and Ladner 2002), and optimizing computations in data stream models (Cormode and Muthukrishnan 2005). For a more detailed exploration of group testing’s historical development and its varied applications, readers are directed to Cicalese et al. (2013); Du and Hwang (2000, 2006); Hwang (1972); Macula and Popyack (2004); Ngo and Du (1999) and the references therein.

Adaptive algorithms in group testing design their tests based on the results of prior tests, while non-adaptive algorithms execute tests that do not depend on each other, enabling simultaneous testing in a single phase. Due to their efficiency and the ability to perform all tests simultaneously, non-adaptive algorithms are frequently favored across different group testing scenarios (Du and Hwang 2000, 2006).

The task of estimating the number of defective items, $d:=|I|$, within a constant factor $\alpha $, involves finding an integer D such that $d \le D \le \alpha d$. This challenge has broad applications across various domains, as evidenced by references (Chen and Swallow 1990; Swallow 1985; Thompson 1962; Walter et al. 1980; Gastwirth and Hammick 1989). Estimating the number of defective items |I| in a set X has received considerable attention, as highlighted by research such as (Bshouty et al. 2017; Cheng and Xu 2014; Damaschke et al. 2010; Damaschke and Muhammad 2010; Falahatgar et al. 2016; Ron and Tsur 2016). Our research in this paper is specifically dedicated to exploring this issue within the context of non-adaptive methods.

In Bshouty (2019), showed that any deterministic non-adaptive algorithm for estimating the number of defective items requires at least $\Omega (n)$ tests. For randomized algorithms, Damaschke and Muhammad (2010) introduced a non-adaptive randomized algorithm that makes $O(\log n)$ tests and, with high probability, returns an integer D satisfying $D\ge d$ and $\textbf{E}[D]=O(d)$. Furthermore, Bshouty (2019) developed a polynomial-time randomized algorithm that makes $O(\log n)$ tests that, with probability at least 2/3, returns an estimate of the number of defective items within a constant factor. This algorithm can be readily adapted into one that makes $O(\log n/\log \alpha )$ tests and, with probability at least 2/3, returns an estimate of the number of defective items to within a factor of $\alpha $.

For a constant estimation, Damaschke and Sheikh Muhammad (Damaschke and Muhammad 2010) established a lower bound of $\Omega (\log n)$; however, this result holds solely to algorithms that uniformly and independently choose each item in the test at a fixed probability. They conjectured that any randomized algorithm, maintaining a constant probability of failure, would also need $\Omega (\log n)$ tests. This hypothesis was validated by Ron and Tsur (2016)^{Footnote 1} and independently by Bshouty (2019), albeit with a modification factor of $\log \log n$. That is, they gave the lower bound

$$\Omega \left( \dfrac{\log n}{\log \log n}\right) .$$

In this paper, we prove a lower bound of

$$\Omega \left( \dfrac{\log n}{(c\log ^* n)^{(\log ^*n)+1}}\right) $$

tests, where c is a constant and $\log ^*n$ is the minimal integer m such that $\log \log {\mathop {\ldots }\limits ^{m}}\log n<2$. This lower bound also implies the lower bound

$$\Omega \left( \dfrac{\log n}{\log \log {\mathop {\ldots }\limits ^{j}}\log n}\right) $$

for any constant j.

An attempt was made to establish this bound in Bshouty (2018), but a flaw in the proof was later identified. Consequently, a less stringent bound, $\Omega (\log n/\log \log n)$, was proved and published in Bshouty (2019).

For estimates with a non-constant factor (such as $\alpha =\log \log n$), the result of Ron and Tsur (2016) and of Bshouty (2019) can be expanded to establish a lower bound of $\Omega (\log n / \log \max (\alpha , n))$. This lower bound is tight when $\alpha \ge \log n$. In this paper, we show that, if a constant j exists such that $\alpha >{\log \log {\mathop {\cdots }\limits ^{j}}\log n}$, then any non-adaptive randomized algorithm that, with probability at least 2/3, estimates the number of defective items |I| to within a factor $\alpha $ requires at least

$$\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

In this case, the lower bound matches the upper bound.

The paper is organized as follows: We begin with a subsection that outlines the method employed for proving the lower bound. In Sect. 2, we clarify the notation and terminology that will be consistently used in the paper. The demonstration of the lower bound is then presented in Sect. 3.

1.1 Old and new techniques

In this section, we’ll describe the old and new techniques used to prove the lower bounds.

Consider a set of items $X=[n]$, and let ${\mathcal {I}}=2^{X}$ represent all possible sets of defective items. Our goal is to find a lower bound on the number of tests required by any non-adaptive randomized algorithm. This algorithm should, with probability at least a 2/3, produce an integer D(I) for any defective item set $I\in {\mathcal {I}}$, where^{Footnote 2}$|I|\le D(I)\le 2|I|$.

Note that we will initially focus on constant estimation. At the end of this section, we will address the results for any estimation factor $\alpha $.

1.1.1 Old technique

Bshouty’s method in Bshouty (2019) is described as follows: Suppose a non-adaptive randomized algorithm ${\mathcal {A}}$ makes s tests. Denoted the tests that ${\mathcal {A}}$ makes by the random variable set ${\mathcal {Q}}=\{Q_1, \ldots , Q_s\}$. For each set of defective items $I\in {\mathcal {I}}$, the algorithm produces an integer D(I) that, with probability at least 2/3, satisfies $|I|\le D(I)\le 2|I|$.

First, Bshouty (2019) defines a partition of the test set ${\mathcal {Q}} = \bigcup _{i=1}^r {\mathcal {Q}}_i$, where each ${\mathcal {Q}}_i$, $i\in [r]$, encompasses tests whose sizes fall within the interval $[n_i, n_{i+1}]$, where $n_0=1$ and $n_{i+1} = {poly}(\log n) \cdot n_i$. The total number of such intervals is $r=\Theta (\log n/\log \log n)$. Given a large enough constant c, by Markov’s bound, there is a particular j (depends only on ${\mathcal {A}}$, and independent of $\mathcal {A}$’s random seed) for which, with probability at least $1-1/c$, $|{\mathcal {Q}}_j|\le c s/r$.

They then find an integer d that depends on j (and therefore on ${\mathcal {A}}$), so that for each $m\in [d,4d]$, and any uniform random I chosen from ${\mathcal {I}}_m:=\{I\in {\mathcal {I}}: |I|=m\}$, the results of all the tests that lie outside ${\mathcal {Q}}_j$ can, with high probability, be predicted (without having to perform a test). The key idea in Bshouty (2019) is that, since for $Q\in {\mathcal {Q}}_j$ we have $n_j\le |Q|\le poly(\log n)n_j$, and therefore it is possible to find an integer d such that, for a random uniform set of a defective item of size $m\in [d,4d]$, with high probability, the answers to all the tests Q that satisfy $|Q|<n_j$ are 0 and, with high probability, the answers to all the tests Q that satisfy $|Q|>poly(\log n)n_j$ are 1.

This shows that the set of tests ${\mathcal {Q}}_j$ is capable, with high probability, of estimating the size of a set of defective items of a uniformly random I when $|I|\in [d,4d]$. Specifically, it can, with high probability, distinguish^{Footnote 3} between a set of defective items of size d and one of size 4d.

If cs/r is less than one, then $|{\mathcal {Q}}_j|=cs/r<1$, and since $|{\mathcal {Q}}_j|$ is an integer, we get $|{\mathcal {Q}}_j|=0$ and ${\mathcal {Q}}_j=\emptyset $ (with high probability). This leads to a contradiction since the algorithm cannot, with high probability, distinguish between the case where $|I|=d$ and $|I|=4d$ without any tests. Consequently, it must be that $c s/r \ge 1$, concluding that the number of tests s must exceed $r/c = \Omega (\log n/\log \log n)$. This establishes a lower bound for the number of tests required by any non-adaptive randomized algorithm to estimate the size of the defective set.

1.1.2 New technique

The limitation of the approach employed in Bshouty (2019) lies in the application of the union bound that is used to prove that the outcome of the tests in ${\mathcal {Q}}\backslash {\mathcal {Q}}_j$ can be predicted without actually making the tests. Achieving a high probability of accurate prediction requires that r be sufficiently small. Consequently, to satisfy the condition $c s/r<1$, the number of tests s must also be sufficiently small.

We surmount the bottleneck in Bshouty (2019) by implementing the following technique. Let $\mathcal{A}_0$ be any non-adaptive estimation algorithm and $\mathcal{Q}_0$ be the set of tests the algorithm makes. As in Bshouty (2019), we partition the set of tests ${\mathcal {Q}}_0=\cup _{i=1}^{r_0}{\mathcal {Q}}_0^{(i)}$ to $r_0=O(\log n/\log \log n)$ sets where each ${\mathcal {Q}}_0^{(i)}$, $i\in [r_0]$, encompasses tests whose sizes fall within the interval $[n_i, n_{i+1}]$, where $n_0=1$ and $n_{i+1} = {poly}(\log n) \cdot n_i$. Let $\tau =\log ^*n$. By Markov’s bound, there exists j such that, with probability at least $1-1/\tau $, we have $|{\mathcal {Q}}_0^{(j)}|\le \tau s/r_0$. Next, we select a subset ${\mathcal {I}}_1\subset {\mathcal {I}}_0:=2^X$ such that, for any uniform random $I\in {\mathcal {I}}_1$, with a probability of at least $1-1/\tau $, we can predict the results of tests that are not in ${\mathcal {Q}}_0^{(j)}$.

We then give the following algorithm ${\mathcal {A}}_1$ that, with high probability, estimates |I| for any defective sets $I\in {\mathcal {I}}_1$:

We then prove that if algorithm $\mathcal {A}_0$ successfully estimates the size of any defective set I with a probability of at least 2/3, then algorithm $\mathcal {A}_1$ will also successfully estimate the size of any set $I\in \mathcal {I}_1$ with a probability of at least $2/3 - 2/\tau $.

This follows from the following:

1.
Making the tests in $\phi ({\mathcal {Q}}_0^{(j)})$ with the defective set of items I is the same as making the tests in ${\mathcal {Q}}_0^{(j)}$ with a defective set of items $\phi ^{-1}(I)$.
2.
For a random uniform permutation $\phi $, we have that $\phi ^{-1}(I)$ is a random uniform of size |I|.
3.
Since $\phi ^{-1}(I)$ is a random uniform set of size |I|, the answers to the tests in ${\mathcal {Q}}_0\backslash {\mathcal {Q}}_0^{(j)}$ can be determined with high probability.
4.
With high probability, the size of $\mathcal {Q}_0^{(j)}$ is at most $\tau s/r_0$.

Now, as before, if $\tau s/r_0<1$, then, with high probability, ${\mathcal {Q}}_0^{(j)}=\emptyset $, and then Algorithm 1 does not make any tests. Furthermore, if in ${\mathcal {I}}_1$ there exist two instances $I_1$ and $I_2$ such that $|I_1|<4|I_2|$, the outcome for $I_1$ cannot be equal to the outcome for $I_2$. This leads to a contradiction because if the algorithm makes no tests, it cannot distinguish between $I_1$ and $I_2$. This contradiction reinforces the lower bound of $r_0/\tau $ for any non-adaptive randomized algorithm that solves the estimation problem. If we pause at this point in the proof, we establish the lower bound $r_0/\tau = \Omega ({\log n}/{(\log ^* n)\log \log n})$.

To derive a better lower bound, we again do the above procedure. We take the algorithm $\mathcal{A}_1$ that solves the estimation problem for $I\in \mathcal{I}_1$ with the tests $\mathcal{Q}_1:={\mathcal {Q}}_0^{(j)}$ with a success probability of at least $2/3-2/\tau $ and, employing the identical technique as previously, we build a new non-adaptive algorithm ${\mathcal {A}}_2$ that solves the estimation problem for $I\in {\mathcal {I}}_2\subset {\mathcal {I}}_1$ using the tests in ${\mathcal {Q}}_2\subset {\mathcal {Q}}_1$, with a success probability of at least $2/3-4/\tau $. At this stage we partition $\mathcal{Q}_1$ to $r_1=O(\log \log n/\log \log \log n)$ sets ${\mathcal {Q}}_1=\cup _{i=1}^{r_1}\mathcal{Q}_1^{(i)}$, where each $\mathcal{Q}_1^{(i)}$ encompasses tests whose sizes fall within the interval $[n'_i, n'_{i+1}]$, where $n'_0=n_j$ and $n'_{i+1} = {poly}(\log \log n) \cdot n'_i$.

The number of tests that algorithm ${\mathcal {A}}_2$ makes is $\tau ^2s/(r_0r_1)$. The lower bound achieved here is now $r_0r_1/\tau ^2=\Omega (\log n/((\log ^*n)^2\log \log \log n))$, which is better than the earlier lower bound $r_0/\tau =\Omega ({\log n}/{(\log ^* n)\log \log n})$.^{Footnote 4}

By repeating this process $\ell :=\tau /24-\log ^*\tau $ times, we end up with an algorithm that makes $t:=\tau ^\ell s/(r_0r_1r_2r_3\cdots r_\ell )$ tests where $r_k=O(\log ^{[k+1]}/$ $\log ^{[k+2]}n)$ and $\log ^{[i]}n={\log \log {\mathop {\cdots }\limits ^{i}}\log n}$. If $t<1$, then the algorithm makes no tests and, with a probability of at least^{Footnote 5}$2/3-2(\tau /24)/\tau =7/12>1/2$, it can distinguish between two sets of defective items $I_1$ and $I_2$ that cannot have the same outcome. This gives the lower bound

$$\dfrac{r_0r_1r_2r_3\cdots r_\ell }{\tau ^{\tau /24}}=\dfrac{\log n}{(c\tau )^{\tau /24}}=\Omega \left( \dfrac{\log n}{(c\log ^* n)^{(\log ^*n)+1}}\right) $$

for some constant c.

1.1.3 Old attempt

A previous effort to set this bound was undertaken in Bshouty (2018); however, a flaw was found in the proof. Consequently, the weaker bound of $\Omega (\log n/\log \log n)$ was established and published in Bshouty (2019). In Bshouty (2018), Bshouty did not employ Algorithm 1. Rather, he applied the same analysis to ${\mathcal {Q}}_0^{(j)}$ (as opposed to $\phi ({\mathcal {Q}}_0^{(j)})$), leading to numerous dependent events within the proof. The crucial aspect of our analysis’s success lies in integrating the random permutation $\phi $ into Algorithm 1. This permutation makes the events independent, allowing the replication of Algorithm 1 for ${\mathcal {Q}}_0^{(j)}$.

1.1.4 Any estimation $\alpha $

For estimates with a non-constant factor $\alpha $ (such as $\alpha =\log \log n$), the algorithm of Bshouty in Bshouty (2019) can be readily adapted into one that makes $O(\log n/\log \alpha )$ tests and, with probability at least 2/3, returns an estimate of the number of defective items to within a factor of $\alpha $. The result of (Ron and Tsur 2016) and of Bshouty (2019) can be expanded to establish a lower bound of

$$\Omega \left( \dfrac{\log n}{\log \max (\alpha , n)}\right) .$$

This lower bound is tight when $\alpha \ge \log n$. In this paper, we show that if a constant j exists such that $\alpha >\log ^{[j]}n$ (recall that $\log ^{[j]}n:={\log \log {\mathop {\cdots }\limits ^{j}}\log n}$), then any non-adaptive randomized algorithm that, with probability at least 2/3, estimates the number of defective items |I| to within a factor $\alpha $ requires at least

$$\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

In this case, the lower bound matches the upper bound.

The proof follows the same procedure as described above, with two modifications:

1.
Rather than starting with $\tau =\log ^*n$, we start with $\tau =10k$, where k is the the maximum integer for which $\alpha > \log ^{[k]}n$.
2.
When we reach stage $k-1$, the set $\mathcal{I}_{k-1}$ contains defective sets of sizes within a factor $\log ^{[k-1]}n>\alpha $ that cannot be distinguished by the estimation algorithm with no tests. However, $\mathcal{I}_k$ will not contain such sets because $\alpha >\log ^{[k]}n$. Thus at stage k we partition $\mathcal{Q}_{k-1}$ to $r_{k-1}=O({\log ^{[k-1]}}/\log \alpha )$ sets ${\mathcal {Q}}_{k-1}=\cup _{i=1}^{r_1}\mathcal{Q}_{k-1}^{(i)}$, where each $\mathcal{Q}_{k-1}^{(i)}$ encompasses tests whose sizes fall within the interval $[n'_i, n'_{i+1}]$, where $n'_{i+1} = \alpha ^2 \cdot n'_i$.

Those changes give the lower bound

$$\dfrac{r_0r_1r_2r_3\cdots r_{k-1}}{\tau ^{\tau /24}}=\dfrac{\log n}{(c\tau )^{\tau /24}\log \alpha }=\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

2 Definitions and notation

In this section, we present various definitions and establish the notation used throughout.

We define the set of items as $X = [n] = \{1, 2, \ldots , n\}$ and the set of defective items as $I \subseteq X$. The algorithm is given n and has access to a test oracle ${\mathcal {O}}_I$. It can use this oracle to make a test $Q\subseteq X$, with the oracle responding ${\mathcal {O}}_I(Q):=1$ if $Q\cap I\not =\emptyset $, and ${\mathcal {O}}_I(Q):=0$ if $Q\cap I=\emptyset $.

An algorithm A is said to $\alpha $-estimate the number of defective items with a probability of at least $1-\delta $ if, for any set $I\subseteq X$, A runs in polynomial time in n, use the oracle ${\mathcal {O}}_I$ to make tests, and with probability at least $1-\delta $, outputs an integer D satisfying $|I|\le D\le \alpha |I|$. If $\alpha $ is a constant (independent of n), then we say that the algorithm estimates the number of defective items within a constant factor.

The algorithm is called non-adaptive if the tests do not depend on the outcomes of previous tests, allowing all tests to be conducted simultaneously in a single step. Our goal is to develop a non-adaptive algorithm that makes a minimum number of tests and, with a probability of at least $1-\delta $, outputs an estimation of the number of defective items within a constant factor.

We define $\log ^{[k]}n=\log \log {\mathop {\ldots }\limits ^{k}}\log n$ with $\log ^{[0]}n=n$. It is noted that $\log ^{[i+1]}n=\log \log ^{[i]}n$ and $\log ^{[i-1]}n=2^{\log ^{[i]}n}$. Let $\mathbb {N}=\{0,1,\cdots \}$. For two real numbers $r_1,r_2$, we denote $[r_1,r_2]=\{r\in \mathbb {N}|r_1\le r\le r_2\}$. Random variables and random sets will be presented in bold.

3 The lower bound constant estimation

In this section, we establish the lower bound for the number of tests required by any non-adaptive randomized algorithm that $\alpha $-estimates the number of defective items, for any constant $\alpha $.

3.1 Lower bound for randomized algorithm

In this section, we prove.

Theorem 1

Let $\tau =\log ^*n$ and $\alpha $ be any constant. A non-adaptive randomized algorithm that $\alpha $-estimates the number of defective items with a probability of at least 2/3 is required to make at least

$$\Omega \left( \dfrac{\log n}{(480\tau )^{\tau +1}}\right) $$

tests.

We begin by proving the following.

Lemma 1

Let $n_1=n$. Let $i\ge 1$ be an integer such that $\log ^{[i]}n\ge \tau :=\log ^*n$. Suppose there is an integer $n_i=n^{\Omega (1)}\le n$ and a non-adaptive randomized algorithm ${\mathcal {A}}_i$ that makes

$$\begin{aligned} s_i:=\dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \end{aligned}$$

(1)

tests and for every set of defective items I of size

$$d\in D_i:=\left[ \dfrac{n}{n_i},\dfrac{n(\log ^{[i-1]}n)^{1/4}}{n_i}\right] ,$$

with probability at least $1-\delta $, $\alpha $-estimates d. Then there is an integer $n_{i+1}=n^{\Omega (1)}\le n$ and a non-adaptive randomized algorithm ${\mathcal {A}}_{i+1}$ that makes

$$\begin{aligned} s_{i+1}:=\dfrac{\log ^{[i+1]} n}{(480\tau )^{\tau -i+1}} \end{aligned}$$

(2)

tests and for every set of defective items I of size

$$d\in D_{i+1}:=\left[ \dfrac{n}{n_{i+1}},\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\right] ,$$

with probability at least $1-\delta -1/(12\tau )$, $\alpha $-estimates d.

Proof

Let

$$N_{i}=\left[ \dfrac{n_i}{(\log ^{[i-1]}n)^{1/4}},n_i\right] .$$

We are interested in all the tests Q made by the algorithm ${\mathcal {A}}_i$ that satisfy $|Q|\in N_i$. We will now partition $N_i$ into smaller subsets. Let

$$N_{i,j}=\left[ \dfrac{n_i}{(\log ^{[i]}n)^{4j+4}},\dfrac{n_i}{(\log ^{[i]}n)^{4j}}\right] $$

where $j=[0,r_i-1]$ and

$$\begin{aligned} r_i=\dfrac{\log ^{[i]}n}{16\log ^{[i+1]}n}. \end{aligned}$$

(3)

Since the lowest endpoint of the interval $N_{i,r_i-1}$ is

$$\dfrac{n_i}{(\log ^{[i]}n)^{4(r_i-1)+4}}=\dfrac{n_i}{2^{4r_i\log ^{[i+1]}n}}=\dfrac{n_i}{2^{(1/4)\log ^{[i]}n}}=\dfrac{n_i}{(\log ^{[i-1]}n)^{1/4}}$$

and the right endpoint of $N_{i,0}$ is $n_i$, we have, $N_i=\cup _{j=0}^{r_i-1}N_{i,j}$.

Let ${\varvec{\mathcal {Q}}}=\{\varvec{Q}_1,\ldots ,\varvec{Q}_{s_i}\}$ be the random variable tests that the randomized algorithm ${\mathcal {A}}_i$ makes. Let $\textbf{T}_j$ be a random variable representing the number of tests $\varvec{Q}\in \varvec{\mathcal {Q}}$ that satisfy $|\varvec{Q}|\in N_{i,j}$. Since ${\mathcal {A}}_i$ makes $s_i$ tests, we have $\textbf{T}_0+\cdots +\textbf{T}_{r_i}\le s_i$. Therefore, by (1) and (3), (in the expectation, $\textbf{E}_j$, j is uniformly at random over $[0,r_i-1]$, and the other $\textbf{E}$ is over the random seed of the algorithm ${\mathcal {A}}_i$)

$$\textbf{E}_j\left[ \textbf{E}[\textbf{T}_j]\right] =\textbf{E}\left[ \textbf{E}_j[\textbf{T}_j]\right] \le \dfrac{s_i}{r_i}=\dfrac{16 \log ^{[i+1]}n}{(480\tau )^{\tau -i+2}}.$$

Therefore, there exists $0\le j_i\le r_i-1$ that depends solely on the algorithm ${\mathcal {A}}_i$ (not the algorithm’s seed) such that

$$\textbf{E}[\textbf{T}_{j_i}]\le \frac{s_i}{r_i}=\dfrac{16 \log ^{[i+1]}n}{(480\tau )^{\tau -i+2}}.$$

By Markov’s bound, with probability at least $1-16/(480\tau )=1-1/(30\tau )$,

$$\begin{aligned} |\{\textbf{Q}\in \varvec{\mathcal {Q}}:|\textbf{Q}|\in N_{i,j_i}\}|=\textbf{T}_{j_i}\le \frac{(480\tau )s_i}{16r_i}=\dfrac{\log ^{[i+1]}n}{(480\tau )^{\tau -i+1}}=s_{i+1}. \end{aligned}$$

(4)

Define

$$\begin{aligned} n_{i+1}=\dfrac{n_i}{(\log ^{[i]}n)^{4j_i+2}}. \end{aligned}$$

(5)

Since $n_i=n^{\Omega (1)}$ and

$$(\log ^{[i]}n)^{4j_i+2}\le (\log ^{[i]}n)^{4r_i-2}=\dfrac{(\log ^{[i-1]}n)^{1/4}}{(\log ^{[i]}n)^2},$$

we have that $n_{i+1}=n^{\Omega (1)}\le n$. Notice that this holds even for $i=1$. This is because $n_1=n$ and $(\log ^{[0]}n)^{1/4}=n^{1/4}$ so $n_2\ge n^{1/4}\log ^2n=n^{\Omega (1)}$.

Consider the following randomized algorithm ${\mathcal {A}}_{i}'$:

Consider the following algorithm ${\mathcal {A}}_{i+1}$:

We now show that for every set of defective items |I| of size

$$\begin{aligned} d\in D_{i+1}:=\left[ \dfrac{n}{n_{i+1}},\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\right] , \end{aligned}$$

(6)

with probability at least $1-\delta -1/(12\tau )$, algorithm ${\mathcal {A}}_{i+1}$ $\alpha $-estimates d using $s_{i+1}$ tests.

In algorithm ${\mathcal {A}}_{i+1}$, Step 8 is the only step that makes tests. Therefore, by step 7 the test complexity of ${\mathcal {A}}_{i+1}$ is $s_{i+1}$.

By the definition of $D_i$, and since, by (5), $n/n_{i+1}>n/n_i$, and, by (5) and (3)

$$\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}=\dfrac{n}{n_i}(\log ^{[i]}n)^{4j_i+2.25}\le \dfrac{n}{n_i}(\log ^{[i]}n)^{4r_i-1.75}\le \dfrac{n(\log ^{[i-1]}n)^{1/4}}{n_i},$$

we can conclude that $D_{i+1}\subset D_i$.

Consider the following events:

1.
$\underline{Event} M_0$: For some $\varvec{Q}'=\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}'$ such that
$$|\varvec{Q}'|\le \frac{n_i}{(\log ^{[i]}n)^{4j_i+4}}$$
(i.e., $\varvec{Q}'\in \varvec{\mathcal {Q}}_0$), $\varvec{Q}'\cap I\not =\emptyset $ (i.e., the answer to the test $\varvec{Q}$ is 1).
2.
$\underline{Event} M_1$: For some $\varvec{Q}'=\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}'$ such that
$$|\varvec{Q}'|\ge \frac{n_i}{(\log ^{[i]}n)^{4j_i}}$$
(i.e., $\varvec{Q}'\in \varvec{\mathcal {Q}}_1$) $\varvec{Q}'\cap I=\emptyset $ (i.e., the answer to the test $\varvec{Q}$ is 0).
3.
$\underline{Event} W$:
$$|\varvec{\mathcal {Q}}''|>s_{i+1}=\dfrac{\log ^{[i+1]}n}{(480\tau )^{\tau -i+1}}.$$

The success probability of the algorithm ${\mathcal {A}}_{i+1}$ on a set I of defective items with $|I|=d\in D_{i+1}$ is (here the probability is over $\varvec{\phi }$ and the random tests $\varvec{\mathcal {Q}}$)

$$\begin{aligned} \textbf{Pr}[{\mathcal {A}}_{i+1}\text { succeeds on }I]&= \textbf{Pr}[({\mathcal {A}}_{i}'\text { succeeds on }I)\wedge \overline{M_0}\wedge \overline{M_1}\wedge \overline{W}]\\&\ge \textbf{Pr}[{\mathcal {A}}_i'\text { succeeds on }I]-\textbf{Pr}[M_0\vee M_1\vee W]\\&\ge \textbf{Pr}[{\mathcal {A}}_i'\text { succeeds on }I]-\textbf{Pr}[M_0]-\textbf{Pr}[M_1]-\textbf{Pr}[W]. \end{aligned}$$

Now, since $|\varvec{\phi }^{-1}(I)|=|I|$ and $\varvec{Q}'\cap I=\varvec{\phi }(\varvec{Q})\cap I\not =\emptyset $ if and only if $\varvec{Q}\cap \varvec{\phi }^{-1}(I)\not =\emptyset $,

$$ \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[{\mathcal {A}}_i' \text{ succeeds } \text{ on } I]=\underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[{\mathcal {A}}_i \text{ succeeds } \text{ on } \phi ^{-1}(I)]\ge 1-\delta . $$

Therefore, to get the result, it is enough to show that $\textbf{Pr}[M_0]\le 1/(300\tau ), $ $\textbf{Pr}[M_1]\le 1/(300\tau )$ and $\textbf{Pr}[W]\le 1/(30\tau )$.

First, since $|\varvec{Q}'|=|\varvec{\phi }(\varvec{Q})|=|\varvec{Q}|$ we have

$$|\varvec{\mathcal {Q}}''|=|\{\varvec{Q}_i':|\varvec{Q}_i'|\in N_{i,j_i}\}|=|\{\varvec{Q}_i:|\varvec{Q}_i|\in N_{i,j_i}\}|=\textbf{T}_{j_i}.$$

By (4), with probability at most $1/(30\tau )$,

$$|\varvec{\mathcal {Q}}''|=\textbf{T}_{j_i}>\dfrac{\log ^{[i+1]}n}{(480\tau )^{\tau -i+1}}.$$

Therefore, $\textbf{Pr}[W]\le 1/(30\tau )$.

We now will show that $\textbf{Pr}[M_0]\le 1/(300\tau )$. We have, (A detailed explanation of every step can be found below.)

$$\begin{aligned} \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[M_0]= & \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[(\exists \varvec{Q}\in \varvec{\mathcal {Q}},\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}_0) \ \varvec{\phi }(\varvec{Q})\cap I\not =\emptyset ]\end{aligned}$$

(7)

$$\begin{aligned}= & \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[(\exists \varvec{Q}\in \varvec{\mathcal {Q}},\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}_0) \ \varvec{Q}\cap \varvec{\phi }^{-1}(I)\not =\emptyset ]\end{aligned}$$

(8)

$$\begin{aligned}\le & s_i \left( 1-\prod _{k=0}^{d-1}\left( 1-\dfrac{n_i}{(\log ^{[i]}n)^{4j_i+4}(n-k)}\right) \right) \end{aligned}$$

(9)

$$\begin{aligned}\le & s_i \left( 1-\left( 1-\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n}\right) ^d\right) \end{aligned}$$

(10)

$$\begin{aligned}\le & s_i d\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n}\end{aligned}$$

(11)

$$\begin{aligned}\le & \dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \cdot \dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n} \end{aligned}$$

(12)

$$\begin{aligned}= & \dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \cdot \dfrac{n(\log ^{[i]}n)^{4j_i+2\frac{1}{4}}}{n_{i}}\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n} \end{aligned}$$

(13)

$$\begin{aligned}= & \dfrac{2}{(480\tau )^{\tau -i+2}(\log ^{[i]}n)^{3/4}} \le \dfrac{1}{300\tau }. \end{aligned}$$

(14)

(7) is derived from the definition of the event $M_0$. (8) follows from the observation that, for any permutation $\phi :[n]\rightarrow [n]$ and two subsets $X,Y\subseteq [n]$, the condition $\phi (X)\cap Y\not =\emptyset $ is equivalent to $X\cap \Phi ^{-1}(Y)\not =\emptyset $. (9) follows from:

1.
The application of the union bound and the fact that $|\varvec{\mathcal {Q}}_0|\le |\varvec{\mathcal {Q}}|= s_i$.
2.
For a uniformly random $\varvec{\phi }$, and a d-subset of $I\subset [n]$, $\varvec{\phi }^{-1}(I)$ is a uniformly random d-subset of [n].
3.
For every $\varvec{Q}\in \varvec{\mathcal {Q}}$ for which $\phi (\varvec{Q})\in \varvec{\mathcal {Q}}_0$, we have
$$|\varvec{Q}|=|\phi (\varvec{Q})|\le {n_i}/{(\log ^{[i]}n)^{4j_i+4}}.$$

(10) follows from the fact that since $d\in D_{i+1}$ and $n_{i+1}=n^{\Omega (1)}$, by (6), it follows that $d\le n/2$. (11) follows from the inequality $(1-x)^d\ge 1-dx$. (12) follows from (1) and (6). (13) follows from (5). Finally, (14) follows from the fact that since $\log ^{[i]}n\ge \tau =\log ^*n$, it holds that $i\le \tau $, and therefore, $(480\tau )^{\tau -i+2}\ge (480\tau )^{2}\ge 600\tau $.

We now demonstrate that $\textbf{Pr}[M_1] \le \frac{1}{300\tau }$.

$$\begin{aligned} \textbf{Pr}_{\varvec{\phi },\varvec{\mathcal {Q}}}[M_1]= & \textbf{Pr}_{\varvec{\phi },\varvec{\mathcal {Q}}}\left[ \left( \exists \varvec{Q} \in \varvec{\mathcal {Q}} \text { s.t. } \varvec{\phi }(\varvec{Q}) \in \varvec{\mathcal {Q}}_1\right) \wedge \left( \varvec{\phi }(\varvec{Q}) \cap I = \emptyset \right) \right] \end{aligned}$$

(15)

$$\begin{aligned}= & \textbf{Pr}_{\varvec{\phi },\varvec{\mathcal {Q}}}\left[ \left( \exists \varvec{Q} \in \varvec{\mathcal {Q}} \text { s.t. } \varvec{\phi }(\varvec{Q}) \in \varvec{\mathcal {Q}}_1\right) \wedge \left( \varvec{Q} \cap \varvec{\phi }^{-1}(I) = \emptyset \right) \right] \end{aligned}$$

(16)

$$\begin{aligned}\le & s_i \prod _{k=0}^{d-1}\left( 1 - \frac{n_i}{(\log ^{[i]} n)^{4j_i}(n-k)}\right) \end{aligned}$$

(17)

$$\begin{aligned}\le & s_i \left( 1 - \frac{n_i}{(\log ^{[i]} n)^{4j_i} n}\right) ^d\nonumber \\\le & s_i \exp \left( -\frac{dn_i}{(\log ^{[i]} n)^{4j_i} n}\right) \end{aligned}$$

(18)

$$\begin{aligned}\le & \frac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \exp \left( -\frac{\frac{n}{n_{i+1}} \frac{n_i}{(\log ^{[i]} n)^{4j_i}}}{n}\right) \end{aligned}$$

(19)

$$\begin{aligned}\le & \frac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \exp \left( -(\log ^{[i]} n)^2\right) \end{aligned}$$

(20)

$$\begin{aligned}\le & \frac{1}{300\tau }. \end{aligned}$$

(21)

(15) is derived from the definition of the event $M_1$. (16) follows from the observation that for any permutation $\phi :[n]\rightarrow [n]$ and two subsets $X,Y\subseteq [n]$, the condition $\phi (X)\cap Y = \emptyset $ is equivalent to $X\cap \phi ^{-1}(Y) = \emptyset $. In (17), we again apply the union bound, $|\varvec{\mathcal {Q}}_1| \le |\varvec{\mathcal {Q}}| = s_i$, the fact that $\phi ^{-1}(I)$ is a random uniform $d$-subset, and for $\varvec{Q}' \in \varvec{\mathcal {Q}}_1$, $|\varvec{Q}'| \ge {n_i}/{(\log ^{[i]} n)^{4j_i}}$. (18) follows from the inequalities $\left( 1 - {y}/{(n-k)}\right) \le \left( 1 - {y}/{n}\right) $ and $1-x \le e^{-x}$ for all $x$ and $y \ge 0$. (19) follows from (1) and (6). (20) is based on (5). (21) follows from the fact that $i \le \tau $, implying that $(480\tau )^{\tau -i+2} \ge (480\tau )^{2} \ge 300\tau $.

This completes the proof. $\square $

We are now ready to prove Theorem 1.

Proof

Assume, to the contrary, that there exists a non-adaptive randomized algorithm ${\mathcal {A}}_1$ which, with a probability of at least $2/3$, $\alpha $-estimates the number of defective items while making

$$ m:= \frac{\log n}{(480\tau )^{\tau +1}} $$

tests. Note that $n_1 = n$ and $\log ^{[0]} n = n$. We will apply Lemma 1 with $\delta = 1/3$, $D_1 = [1, n^{1/4}]$, and $s_1 = m$.

Let $\ell $ be an integer such that $\log \log ^* n < \log ^{[\ell ]} n \le \log ^* n = \tau $. Consequently, we have $\log ^{[\ell -1]} n> 2^{\log ^{[\ell ]} n} > 2^{\log \log ^* n} = \tau $. We can now utilize Lemma 1 with $i = \ell - 1$ to derive

$$ s_\ell = \frac{\log ^{[\ell ]} n}{(480\tau )^{\tau - \ell + 2}} \le \frac{\tau }{(480\tau )^2} < 1. $$

Consequently, the algorithm ${\mathcal {A}}_\ell $ does not perform any tests and achieves an $\alpha $-estimation of the size of the defective items $I$ with a probability of at least $2/3 - \frac{\ell }{12\tau } \ge \frac{7}{12} > \frac{1}{2}$, given that

$$ |I| \in D_\ell = \left[ \frac{n}{n_\ell }, \frac{n(\log ^{[\ell -1]} n)^{1/4}}{n_\ell }\right] . $$

Specifically, with a probability exceeding ${1}/{2}$, we are able to differentiate between defective sets of size ${n}/{n_\ell }$ and those larger than $\alpha {n}/{n_\ell }$ without conducting any tests, which is not feasible.

This results in a contradiction. $\square $

4 The lower bound for $\alpha $-estimation

In this section, we sketch the proof of the following tight lower bound.

Theorem 2

Let $\alpha _n$ be any function in n such that, there exists a constant k (independent of n) where $\log ^{[k-1]}n>\alpha \ge \log ^{[k]}n$. Any non-adaptive randomized algorithm that $\alpha $-estimates the number of defective items with a probability of at least 2/3 makes at least

$$\Omega \left( \dfrac{\log n}{\log \alpha _n}\right) $$

tests.

We now give the sketch of the proof.

First, it is sufficient to prove the result for

$$\begin{aligned} (\log ^{[k-1]}n)^{1/c}>\alpha \ge \log ^{[k]}n \end{aligned}$$

(22)

for a sufficiently large c and any constant k. This is because any $\alpha $-estimation where $\log ^{[k-1]}n\ge \alpha \ge (\log ^{[k-1]}n)^{1/c}$ can also serve as a $(\log ^{[k-1]}n)$-estimation.

Let $\alpha =\alpha _n$ and $\tau =2k$. Notice that $\tau =O(1)$. Assume that there is an algorithm that makes

$$m:=\frac{\log n}{(480\tau )^{\tau +1}\log \alpha }$$

tests and with probability at least 2/3 returns an $\alpha $-estimation of the number of defective sets.

We first use the following Lemma, the proof of which is identical to the proof of Lemma 1.

Lemma 2

Let $n_1=n$. Let $i\ge 1$ be an integer such that $\log ^{[i+1]}n\ge \alpha $. Suppose there is an integer $n_i=n^{\Omega (1)}\le n$ and a non-adaptive randomized algorithm ${\mathcal {A}}_i$ that makes

$$\begin{aligned} s_i:=\dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}\log \alpha } \end{aligned}$$

(23)

tests and for every set of defective items I of size

$$d\in D_i:=\left[ \dfrac{n}{n_i},\dfrac{n(\log ^{[i-1]}n)^{1/4}}{n_i}\right] ,$$

with probability at least $1-\delta $, $\alpha $-estimates d. Then there is an integer $n_{i+1}=n^{\Omega (1)}\le n$ and a non-adaptive randomized algorithm ${\mathcal {A}}_{i+1}$ that makes

$$\begin{aligned} s_{i+1}:=\dfrac{\log ^{[i+1]} n}{(480\tau )^{\tau -i+1}\log \alpha } \end{aligned}$$

(24)

tests and for every set of defective items I of size

$$d\in D_{i+1}:=\left[ \dfrac{n}{n_{i+1}},\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\right] ,$$

with probability at least $1-\delta -1/(12\tau )$, $\alpha $-estimates d.

We continue to use the above lemma starting from $s_1=m$ with $i=1$ up to $i=k-1$, to derive an algorithm $\mathcal{A}_k$. This algorithm makes

$$s_k:=\dfrac{\log ^{[k]} n}{(480\tau )^{\tau -k+2}\log \alpha }$$

tests and for every set of defective set of size

$$d\in D_k:=\left[ \dfrac{n}{n_k},\dfrac{n(\log ^{[k-1]}n)^{1/4}}{n_k}\right] ,$$

with probability $2/3-(k-1)/(12\tau )\ge 5/8$, $\alpha $-estimates d. We note here that the constraint made in (22) ensures $s_k > 1$, which is a requirement for our analysis.

We then prove the following Lemma, whose proof, sketched below, is similar to the proof of Lemma 1.

Lemma 3

Let $n_1=n$. Suppose there is an integer $n_k=n^{\Omega (1)}\le n$ and a non-adaptive randomized algorithm ${\mathcal {A}}_k$ that makes

$$\begin{aligned} s_k:=\dfrac{\log ^{[k]} n}{(480\tau )^{\tau -k+2}\log \alpha } \end{aligned}$$

(25)

tests and for every set of defective items I of size

$$d\in D_k:=\left[ \dfrac{n}{n_k},\dfrac{n(\log ^{[k-1]}n)^{1/4}}{n_k}\right] ,$$

with probability at least 5/8, $\alpha $-estimates d. Then there is an integer $n_{k+1}=n^{\Omega (1)}\le n$ and a non-adaptive randomized algorithm ${\mathcal {A}}_{k+1}$ that makes

$$\begin{aligned} s_{k+1}:=\dfrac{4}{(480\tau )^{\tau -k+1}} \end{aligned}$$

(26)

tests and for every set of defective items I of size

$$d\in D_{k+1}:=\left[ \dfrac{n}{n_{k+1}},\dfrac{n\alpha ^4}{n_{k+1}}\right] ,$$

with probability at least $5/8-1/(12\tau )\ge 9/16$, $\alpha $-estimates d.

Before we sketch the proof, we show that this Lemma completes the proof of the Theorem.

Recall that $\tau =2k$. Notice that $s_{k+1}<1$ and, therefore, algorithm $\mathcal{A}_{k+1}$ makes no test and with probability $9/16>1/2$ can distinguish between a defective set of size $(n/n_{k+1})\in D_{k+1}$ and one of size $\alpha ^2(n/n_{k+1})\in D_{k+1}$, which leads to a contradiction. Therefore, any non-adaptive randomized algorithm that $\alpha $-estimates the number of defective items with a probability of at least 2/3 makes at least

$$m=\Omega \left( \frac{\log n}{(480\tau )^{\tau +1}\log \alpha }\right) =\Omega \left( \frac{\log n}{\log \alpha }\right) $$

tests.

To prove Lemma 3, we use the same proof as for Lemma 1 with the following changes.

We have

$$N_{k}=\left[ \dfrac{n_k}{(\log ^{[k-1]}n)^{1/4}},n_k\right] $$

as in Lemma 1 but the partition will be into the following sets

$$N_{i,j}=\left[ \frac{n_k}{\alpha ^{16j+16}},\frac{n_k}{\alpha ^{16j}}\right] $$

where $j\in [0,r_k-1]$ and (see [3])

$$r_k=\frac{\log ^{[k]}n}{64\log \alpha }.$$

Then, as in the proof of Lemma 1, there exists a $j_k$ such that, with a probability at least $1-1/(30\tau )$, the number of tests of sizes in $N_{k,j_k}$ is at most (see [4])

$$s_{k+1}=\frac{(480\tau )s_k}{16r_k}=\dfrac{4}{(480\tau )^{\tau -k+1}}.$$

We now define (see [5])

$$n_{k+1}=\frac{n_k}{\alpha ^{16j_k+8}}.$$

Then $\varvec{\mathcal {Q}}_0$ and $\varvec{\mathcal {Q}}_1$ in Algorithm 3 are defined as follows:

$$\varvec{\mathcal {Q}}_0:=\left\{ \varvec{Q}'\in \varvec{\mathcal {Q}}':|\varvec{Q}'|\le \frac{n_k}{\alpha ^{16j_k+16}}\right\} ,$$

and

$$\varvec{\mathcal {Q}}_1:=\left\{ \varvec{Q}'\in \varvec{\mathcal {Q}}':|\varvec{Q}'|\ge \frac{n_k}{\alpha ^{16j_k}}\right\} .$$

Now it is straightforward to verify (7)-(14) and (15)-(21), which completes the proof of the Lemma.

5 Conclusion

In this paper, we introduce a novel methodology for deriving lower bounds in non-adaptive randomized group testing. Our approach facilitates the establishment of a lower bound of

$$ \Omega \left( \frac{\log n}{(c \log ^* n)^{\log ^* n}}\right) $$

for some constant $c$, concerning the test complexity of any randomized non-adaptive algorithm estimating the number of defective items within a constant factor. Furthermore, we demonstrate a tight lower bound of

$$ \Omega \left( \frac{\log n}{\log \alpha }\right) $$

for $\alpha $-estimation when $\alpha > \log ^{[j]} n$ for a given constant $j$. These two lower bounds represent a significant improvement over the prior bound of

$$ \Omega \left( \frac{\log n}{\log \log n}\right) $$

established in Bshouty (2019); Ron and Tsur (2016).

The key innovation in our work was the incorporation of a random permutation $\phi $ in Algorithm 1, which enabled us to achieve independence among events and allowed for a repeat analysis of the set ${\mathcal {Q}}^{(j)}$. This pivotal step helped us to navigate around the limitations of earlier techniques and attain a more favorable lower bound.

An intriguing open question remains: establishing a lower bound of $\Omega (\log n)$ for the test complexity in non-adaptive randomized algorithms that estimate the number of defective items within a constant factor.

Data Availability

Enquiries about data availability should be directed to the authors.

Notes

The lower bound in Ron and Tsur (2016) pertains to a different model of non-adaptive algorithms, but their technique implies this lower bound.
Any constant can substitute the number 2.
The algorithm returns an integer in the interval [d, 2d] for $|I|=d$ and an integer in the interval [4d, 8d] for $|I|=4d$, and, both intervals are disjoint.
We can take $\tau $ a small fixed constant and get the lower bound $\Omega (\log n/\log \log \log n)$.
In this case, unlike what was noted earlier, $\tau $ cannot be considered a constant.

References

Bshouty NH (2018) Lower bound for non-adaptive estimate the number of defective items. Electron Colloquium Comput Complex TR18-053
Bshouty NH (2019) Lower bound for non-adaptive estimation of the number of defective items. In: Lu P, Zhang G (eds) 30th international symposium on algorithms and computation, ISAAC 2019. LIPIcs, vo. 149, pp 2:1–2:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Bshouty NH, Bshouty-Hurani VE, Haddad G, Hashem T, Khoury F, Sharafy O (2017) Adaptive group testing algorithms to estimate the number of defectives. ALT arXiv:1712.00615
Chen CL, Swallow WH (1990) Using group testing to estimate a proportion, and to test the binomial model. Biometrics 46(4):1035–1046
Article MATH Google Scholar
Cheng Y, Xu Y (2014) An efficient FPRAS type group testing procedure to approximate the number of defectives. J Comb Optim 27(2):302–314
Article MathSciNet MATH Google Scholar
Cicalese F (2013) Fault-tolerant search algorithms - reliable computation with unreliable information. Monographs in theoretical computer science. An EATCS Series, Springer
Cormode G, Muthukrishnan S (2005) What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans Datab Syst 30(1):249–278
Article MATH Google Scholar
Damaschke P, Muhammad AS (2010) Bounds for nonadaptive group tests to estimate the amount of defectives. In: Combinatorial optimization and applications—4th international conference, COCOA 2010, pp 117–130
Damaschke P, Muhammad AS (2010) Competitive group testing and learning hidden vertex covers with minimum adaptivity. Discrete Math Algorithms Appl 2(3):291–312
Article MathSciNet MATH Google Scholar
Dorfman R (1943) The detection of defective members of large populations. Ann Math Stat 436–440
Du D, Hwang FK (2000) Combinatorial group testing and its applications. World Scientific Publishing Company
Du D, Hwang FK (2006) Pooling design and nonadaptive group testing: important tools for DNA sequencing. World Scientific Publishing Company
Falahatgar M, Jafarpour A, Orlitsky A, Pichapati V, Suresh AT (2016) Estimating the number of defectives with group testing. In: IEEE international symposium on information theory, ISIT 2016, pp 1376–1380
Gastwirth JL, Hammick PA (1989) Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of aids antibodies in blood donors. J Stat Plan Inference 22(1):15–27
Article MathSciNet MATH Google Scholar
Hong ES, Ladner RE (2002) Group testing for image compression. IEEE Trans Image Process 11(8):901–911. https://doi.org/10.1109/TIP.2002.801124
Article MATH Google Scholar
Hwang FK (1972) A method for detecting all defective members in a population by group testing. J Am Stat Assoc 67:605–608
Article MATH Google Scholar
Kautz WH, Singleton RC (1964) Nonrandom binary superimposed codes. IEEE Trans Inf Theory 10(4):363–377
Article MATH Google Scholar
Li CH (1962) A sequential method for screening experimental variables. J Am Stat Assoc 57:455–477
Article MathSciNet MATH Google Scholar
Macula AJ, Popyack LJ (2004) A group testing method for finding patterns in data. Discrete Appl Math 144(1–2):149–157
Article MathSciNet MATH Google Scholar
Ngo HQ, Du D (1999) A survey on combinatorial group testing algorithms with applications to DNA library screening. In: Discrete mathematical problems with medical applications, Proceedings of a DIMACS Workshop, December 8-10, 1999. pp 171–182
Ron D, Tsur G (2016) The power of an example: hidden set size approximation using group queries and conditional sampling. ACM Trans Comput Theory 8(4):15:1-15:19
Article MathSciNet MATH Google Scholar
Sobel M, Groll PA (1959) Group testing to eliminate efficiently all defectives in a binomial sample. Bell Syst Tech J 38:1179–1252
Article MathSciNet MATH Google Scholar
Swallow WH (1985) Group testing for estimating infection rates and probabilities of disease transmission. Phytopathology
Thompson KH (1962) Estimation of the proportion of vectors in a natural population of insects. Biometrics 18(4):568–578
Article MATH Google Scholar
Walter SD, Hildreth SW, Beaty BJ (1980) Estimation of infection rates in population of organisms using pools of variable size. Am J Epidemiol 112(1):124–128
Article MATH Google Scholar
Wolf JK (1985) Born again group testing: multiaccess communications. IEEE Trans Inf Theory 31(2):185–191
Article MathSciNet MATH Google Scholar

Download references

Funding

Open access funding provided by Technion - Israel Institute of Technology. The authors have not disclosed any funding.

Author information

Authors and Affiliations

Technion - Israel Institute of Technology, Haifa, Israel
Nader H. Bshouty

Authors

Nader H. Bshouty
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Nader H. Bshouty.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The preliminary version of this paper was published in the proceedings of the 16th International Conference on Combinatorial Optimization and Applications (COCOA 2023), Hawaii, HI, USA, December 15–17, 2023, Proceedings, Part I, Lecture Notes in Computer Science, vol. 14461, ©2024 Springer.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bshouty, N.H. Improved lower bound for estimating the number of defective items. J Comb Optim 49, 33 (2025). https://doi.org/10.1007/s10878-025-01264-9

Download citation

Accepted: 23 January 2025
Published: 20 February 2025
DOI: https://doi.org/10.1007/s10878-025-01264-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improved lower bound for estimating the number of defective items

Abstract

Similar content being viewed by others

Improved Lower Bound for Estimating the Number of Defective Items

On Detecting Some Defective Items in Group Testing

Optimal Deterministic Group Testing Algorithms to Estimate the Number of Defectives

1 Introduction

1.1 Old and new techniques

1.1.1 Old technique

1.1.2 New technique

1.1.3 Old attempt

1.1.4 Any estimation \(\alpha \)

2 Definitions and notation

3 The lower bound constant estimation

3.1 Lower bound for randomized algorithm

Theorem 1

Lemma 1

Proof

Proof

4 The lower bound for \(\alpha \)-estimation

Theorem 2

Lemma 2

Lemma 3

5 Conclusion

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Improved lower bound for estimating the number of defective items

Abstract

Similar content being viewed by others

Improved Lower Bound for Estimating the Number of Defective Items

On Detecting Some Defective Items in Group Testing

Optimal Deterministic Group Testing Algorithms to Estimate the Number of Defectives

Explore related subjects

1 Introduction

1.1 Old and new techniques

1.1.1 Old technique

1.1.2 New technique

1.1.3 Old attempt

1.1.4 Any estimation \(\alpha \)

2 Definitions and notation

3 The lower bound constant estimation

3.1 Lower bound for randomized algorithm

Theorem 1

Lemma 1

Proof

Proof

4 The lower bound for \(\alpha \)-estimation

Theorem 2

Lemma 2

Lemma 3

5 Conclusion

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords