1 Introduction

Consider a collection X comprising n items, with a subset \(I \subseteq X\) representing the defective items. Within group testing, a test involves examining a subset \(Q \subseteq X\), yielding a result of 1 if any defective item is present (i.e., \(Q \cap I \ne \emptyset \)), and 0 if no defective items are detected (i.e., \(Q \cap I = \emptyset \)).

Originally developed as an efficient method for large-scale blood testing (Dorfman 1943), the scope of group testing has expanded significantly to encompass a wide array of applications. These applications range from DNA library screening (Ngo and Du 1999) and quality control in manufacturing (Sobel and Groll 1959), file searching in storage systems (Kautz and Singleton 1964), sequential screening in experimental variables (Li 1962), efficient contention resolution algorithms for multiple-access com- munication (Kautz and Singleton 1964; Wolf 1985), enhancing data compression techniques (Hong and Ladner 2002), and optimizing computations in data stream models (Cormode and Muthukrishnan 2005). For a more detailed exploration of group testing’s historical development and its varied applications, readers are directed to Cicalese et al. (2013); Du and Hwang (2000, 2006); Hwang (1972); Macula and Popyack (2004); Ngo and Du (1999) and the references therein.

Adaptive algorithms in group testing design their tests based on the results of prior tests, while non-adaptive algorithms execute tests that do not depend on each other, enabling simultaneous testing in a single phase. Due to their efficiency and the ability to perform all tests simultaneously, non-adaptive algorithms are frequently favored across different group testing scenarios (Du and Hwang 2000, 2006).

The task of estimating the number of defective items, \(d:=|I|\), within a constant factor \(\alpha \), involves finding an integer D such that \(d \le D \le \alpha d\). This challenge has broad applications across various domains, as evidenced by references (Chen and Swallow 1990; Swallow 1985; Thompson 1962; Walter et al. 1980; Gastwirth and Hammick 1989). Estimating the number of defective items |I| in a set X has received considerable attention, as highlighted by research such as (Bshouty et al. 2017; Cheng and Xu 2014; Damaschke et al. 2010; Damaschke and Muhammad 2010; Falahatgar et al. 2016; Ron and Tsur 2016). Our research in this paper is specifically dedicated to exploring this issue within the context of non-adaptive methods.

In Bshouty (2019), showed that any deterministic non-adaptive algorithm for estimating the number of defective items requires at least \(\Omega (n)\) tests. For randomized algorithms,  Damaschke and Muhammad (2010) introduced a non-adaptive randomized algorithm that makes \(O(\log n)\) tests and, with high probability, returns an integer D satisfying \(D\ge d\) and \(\textbf{E}[D]=O(d)\). Furthermore, Bshouty (2019) developed a polynomial-time randomized algorithm that makes \(O(\log n)\) tests that, with probability at least 2/3, returns an estimate of the number of defective items within a constant factor. This algorithm can be readily adapted into one that makes \(O(\log n/\log \alpha )\) tests and, with probability at least 2/3, returns an estimate of the number of defective items to within a factor of \(\alpha \).

For a constant estimation, Damaschke and Sheikh Muhammad (Damaschke and Muhammad 2010) established a lower bound of \(\Omega (\log n)\); however, this result holds solely to algorithms that uniformly and independently choose each item in the test at a fixed probability. They conjectured that any randomized algorithm, maintaining a constant probability of failure, would also need \(\Omega (\log n)\) tests. This hypothesis was validated by  Ron and Tsur (2016)Footnote 1 and independently by Bshouty (2019), albeit with a modification factor of \(\log \log n\). That is, they gave the lower bound

$$\Omega \left( \dfrac{\log n}{\log \log n}\right) .$$

In this paper, we prove a lower bound of

$$\Omega \left( \dfrac{\log n}{(c\log ^* n)^{(\log ^*n)+1}}\right) $$

tests, where c is a constant and \(\log ^*n\) is the minimal integer m such that \(\log \log {\mathop {\ldots }\limits ^{m}}\log n<2\). This lower bound also implies the lower bound

$$\Omega \left( \dfrac{\log n}{\log \log {\mathop {\ldots }\limits ^{j}}\log n}\right) $$

for any constant j.

An attempt was made to establish this bound in Bshouty (2018), but a flaw in the proof was later identified. Consequently, a less stringent bound, \(\Omega (\log n/\log \log n)\), was proved and published in Bshouty (2019).

For estimates with a non-constant factor (such as \(\alpha =\log \log n\)), the result of  Ron and Tsur (2016) and of Bshouty (2019) can be expanded to establish a lower bound of \(\Omega (\log n / \log \max (\alpha , n))\). This lower bound is tight when \(\alpha \ge \log n\). In this paper, we show that, if a constant j exists such that \(\alpha >{\log \log {\mathop {\cdots }\limits ^{j}}\log n}\), then any non-adaptive randomized algorithm that, with probability at least 2/3, estimates the number of defective items |I| to within a factor \(\alpha \) requires at least

$$\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

In this case, the lower bound matches the upper bound.

The paper is organized as follows: We begin with a subsection that outlines the method employed for proving the lower bound. In Sect. 2, we clarify the notation and terminology that will be consistently used in the paper. The demonstration of the lower bound is then presented in Sect. 3.

1.1 Old and new techniques

In this section, we’ll describe the old and new techniques used to prove the lower bounds.

Consider a set of items \(X=[n]\), and let \({\mathcal {I}}=2^{X}\) represent all possible sets of defective items. Our goal is to find a lower bound on the number of tests required by any non-adaptive randomized algorithm. This algorithm should, with probability at least a 2/3, produce an integer D(I) for any defective item set \(I\in {\mathcal {I}}\), whereFootnote 2\(|I|\le D(I)\le 2|I|\).

Note that we will initially focus on constant estimation. At the end of this section, we will address the results for any estimation factor \(\alpha \).

1.1.1 Old technique

Bshouty’s method in Bshouty (2019) is described as follows: Suppose a non-adaptive randomized algorithm \({\mathcal {A}}\) makes s tests. Denoted the tests that \({\mathcal {A}}\) makes by the random variable set \({\mathcal {Q}}=\{Q_1, \ldots , Q_s\}\). For each set of defective items \(I\in {\mathcal {I}}\), the algorithm produces an integer D(I) that, with probability at least 2/3, satisfies \(|I|\le D(I)\le 2|I|\).

First, Bshouty (2019) defines a partition of the test set \({\mathcal {Q}} = \bigcup _{i=1}^r {\mathcal {Q}}_i\), where each \({\mathcal {Q}}_i\), \(i\in [r]\), encompasses tests whose sizes fall within the interval \([n_i, n_{i+1}]\), where \(n_0=1\) and \(n_{i+1} = {poly}(\log n) \cdot n_i\). The total number of such intervals is \(r=\Theta (\log n/\log \log n)\). Given a large enough constant c, by Markov’s bound, there is a particular j (depends only on \({\mathcal {A}}\), and independent of \(\mathcal {A}\)’s random seed) for which, with probability at least \(1-1/c\), \(|{\mathcal {Q}}_j|\le c s/r\).

They then find an integer d that depends on j (and therefore on \({\mathcal {A}}\)), so that for each \(m\in [d,4d]\), and any uniform random I chosen from \({\mathcal {I}}_m:=\{I\in {\mathcal {I}}: |I|=m\}\), the results of all the tests that lie outside \({\mathcal {Q}}_j\) can, with high probability, be predicted (without having to perform a test). The key idea in Bshouty (2019) is that, since for \(Q\in {\mathcal {Q}}_j\) we have \(n_j\le |Q|\le poly(\log n)n_j\), and therefore it is possible to find an integer d such that, for a random uniform set of a defective item of size \(m\in [d,4d]\), with high probability, the answers to all the tests Q that satisfy \(|Q|<n_j\) are 0 and, with high probability, the answers to all the tests Q that satisfy \(|Q|>poly(\log n)n_j\) are 1.

This shows that the set of tests \({\mathcal {Q}}_j\) is capable, with high probability, of estimating the size of a set of defective items of a uniformly random I when \(|I|\in [d,4d]\). Specifically, it can, with high probability, distinguishFootnote 3 between a set of defective items of size d and one of size 4d.

If cs/r is less than one, then \(|{\mathcal {Q}}_j|=cs/r<1\), and since \(|{\mathcal {Q}}_j|\) is an integer, we get \(|{\mathcal {Q}}_j|=0\) and \({\mathcal {Q}}_j=\emptyset \) (with high probability). This leads to a contradiction since the algorithm cannot, with high probability, distinguish between the case where \(|I|=d\) and \(|I|=4d\) without any tests. Consequently, it must be that \(c s/r \ge 1\), concluding that the number of tests s must exceed \(r/c = \Omega (\log n/\log \log n)\). This establishes a lower bound for the number of tests required by any non-adaptive randomized algorithm to estimate the size of the defective set.

1.1.2 New technique

The limitation of the approach employed in Bshouty (2019) lies in the application of the union bound that is used to prove that the outcome of the tests in \({\mathcal {Q}}\backslash {\mathcal {Q}}_j\) can be predicted without actually making the tests. Achieving a high probability of accurate prediction requires that r be sufficiently small. Consequently, to satisfy the condition \(c s/r<1\), the number of tests s must also be sufficiently small.

We surmount the bottleneck in Bshouty (2019) by implementing the following technique. Let \(\mathcal{A}_0\) be any non-adaptive estimation algorithm and \(\mathcal{Q}_0\) be the set of tests the algorithm makes. As in Bshouty (2019), we partition the set of tests \({\mathcal {Q}}_0=\cup _{i=1}^{r_0}{\mathcal {Q}}_0^{(i)}\) to \(r_0=O(\log n/\log \log n)\) sets where each \({\mathcal {Q}}_0^{(i)}\), \(i\in [r_0]\), encompasses tests whose sizes fall within the interval \([n_i, n_{i+1}]\), where \(n_0=1\) and \(n_{i+1} = {poly}(\log n) \cdot n_i\). Let \(\tau =\log ^*n\). By Markov’s bound, there exists j such that, with probability at least \(1-1/\tau \), we have \(|{\mathcal {Q}}_0^{(j)}|\le \tau s/r_0\). Next, we select a subset \({\mathcal {I}}_1\subset {\mathcal {I}}_0:=2^X\) such that, for any uniform random \(I\in {\mathcal {I}}_1\), with a probability of at least \(1-1/\tau \), we can predict the results of tests that are not in \({\mathcal {Q}}_0^{(j)}\).

We then give the following algorithm \({\mathcal {A}}_1\) that, with high probability, estimates |I| for any defective sets \(I\in {\mathcal {I}}_1\):

Algorithm 1
figure a

Algorithm \({\mathcal {A}}_1\) for estimates |I| when \(I\in {\mathcal {I}}_1\)

We then prove that if algorithm \(\mathcal {A}_0\) successfully estimates the size of any defective set I with a probability of at least 2/3, then algorithm \(\mathcal {A}_1\) will also successfully estimate the size of any set \(I\in \mathcal {I}_1\) with a probability of at least \(2/3 - 2/\tau \).

This follows from the following:

  1. 1.

    Making the tests in \(\phi ({\mathcal {Q}}_0^{(j)})\) with the defective set of items I is the same as making the tests in \({\mathcal {Q}}_0^{(j)}\) with a defective set of items \(\phi ^{-1}(I)\).

  2. 2.

    For a random uniform permutation \(\phi \), we have that \(\phi ^{-1}(I)\) is a random uniform of size |I|.

  3. 3.

    Since \(\phi ^{-1}(I)\) is a random uniform set of size |I|, the answers to the tests in \({\mathcal {Q}}_0\backslash {\mathcal {Q}}_0^{(j)}\) can be determined with high probability.

  4. 4.

    With high probability, the size of \(\mathcal {Q}_0^{(j)}\) is at most \(\tau s/r_0\).

Now, as before, if \(\tau s/r_0<1\), then, with high probability, \({\mathcal {Q}}_0^{(j)}=\emptyset \), and then Algorithm 1 does not make any tests. Furthermore, if in \({\mathcal {I}}_1\) there exist two instances \(I_1\) and \(I_2\) such that \(|I_1|<4|I_2|\), the outcome for \(I_1\) cannot be equal to the outcome for \(I_2\). This leads to a contradiction because if the algorithm makes no tests, it cannot distinguish between \(I_1\) and \(I_2\). This contradiction reinforces the lower bound of \(r_0/\tau \) for any non-adaptive randomized algorithm that solves the estimation problem. If we pause at this point in the proof, we establish the lower bound \(r_0/\tau = \Omega ({\log n}/{(\log ^* n)\log \log n})\).

To derive a better lower bound, we again do the above procedure. We take the algorithm \(\mathcal{A}_1\) that solves the estimation problem for \(I\in \mathcal{I}_1\) with the tests \(\mathcal{Q}_1:={\mathcal {Q}}_0^{(j)}\) with a success probability of at least \(2/3-2/\tau \) and, employing the identical technique as previously, we build a new non-adaptive algorithm \({\mathcal {A}}_2\) that solves the estimation problem for \(I\in {\mathcal {I}}_2\subset {\mathcal {I}}_1\) using the tests in \({\mathcal {Q}}_2\subset {\mathcal {Q}}_1\), with a success probability of at least \(2/3-4/\tau \). At this stage we partition \(\mathcal{Q}_1\) to \(r_1=O(\log \log n/\log \log \log n)\) sets \({\mathcal {Q}}_1=\cup _{i=1}^{r_1}\mathcal{Q}_1^{(i)}\), where each \(\mathcal{Q}_1^{(i)}\) encompasses tests whose sizes fall within the interval \([n'_i, n'_{i+1}]\), where \(n'_0=n_j\) and \(n'_{i+1} = {poly}(\log \log n) \cdot n'_i\).

The number of tests that algorithm \({\mathcal {A}}_2\) makes is \(\tau ^2s/(r_0r_1)\). The lower bound achieved here is now \(r_0r_1/\tau ^2=\Omega (\log n/((\log ^*n)^2\log \log \log n))\), which is better than the earlier lower bound \(r_0/\tau =\Omega ({\log n}/{(\log ^* n)\log \log n})\).Footnote 4

By repeating this process \(\ell :=\tau /24-\log ^*\tau \) times, we end up with an algorithm that makes \(t:=\tau ^\ell s/(r_0r_1r_2r_3\cdots r_\ell )\) tests where \(r_k=O(\log ^{[k+1]}/\) \(\log ^{[k+2]}n)\) and \(\log ^{[i]}n={\log \log {\mathop {\cdots }\limits ^{i}}\log n}\). If \(t<1\), then the algorithm makes no tests and, with a probability of at leastFootnote 5\(2/3-2(\tau /24)/\tau =7/12>1/2\), it can distinguish between two sets of defective items \(I_1\) and \(I_2\) that cannot have the same outcome. This gives the lower bound

$$\dfrac{r_0r_1r_2r_3\cdots r_\ell }{\tau ^{\tau /24}}=\dfrac{\log n}{(c\tau )^{\tau /24}}=\Omega \left( \dfrac{\log n}{(c\log ^* n)^{(\log ^*n)+1}}\right) $$

for some constant c.

1.1.3 Old attempt

A previous effort to set this bound was undertaken in Bshouty (2018); however, a flaw was found in the proof. Consequently, the weaker bound of \(\Omega (\log n/\log \log n)\) was established and published in Bshouty (2019). In Bshouty (2018), Bshouty did not employ Algorithm 1. Rather, he applied the same analysis to \({\mathcal {Q}}_0^{(j)}\) (as opposed to \(\phi ({\mathcal {Q}}_0^{(j)})\)), leading to numerous dependent events within the proof. The crucial aspect of our analysis’s success lies in integrating the random permutation \(\phi \) into Algorithm 1. This permutation makes the events independent, allowing the replication of Algorithm 1 for \({\mathcal {Q}}_0^{(j)}\).

1.1.4 Any estimation \(\alpha \)

For estimates with a non-constant factor \(\alpha \) (such as \(\alpha =\log \log n\)), the algorithm of Bshouty in Bshouty (2019) can be readily adapted into one that makes \(O(\log n/\log \alpha )\) tests and, with probability at least 2/3, returns an estimate of the number of defective items to within a factor of \(\alpha \). The result of  (Ron and Tsur 2016) and of Bshouty (2019) can be expanded to establish a lower bound of

$$\Omega \left( \dfrac{\log n}{\log \max (\alpha , n)}\right) .$$

This lower bound is tight when \(\alpha \ge \log n\). In this paper, we show that if a constant j exists such that \(\alpha >\log ^{[j]}n\) (recall that \(\log ^{[j]}n:={\log \log {\mathop {\cdots }\limits ^{j}}\log n}\)), then any non-adaptive randomized algorithm that, with probability at least 2/3, estimates the number of defective items |I| to within a factor \(\alpha \) requires at least

$$\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

In this case, the lower bound matches the upper bound.

The proof follows the same procedure as described above, with two modifications:

  1. 1.

    Rather than starting with \(\tau =\log ^*n\), we start with \(\tau =10k\), where k is the the maximum integer for which \(\alpha > \log ^{[k]}n\).

  2. 2.

    When we reach stage \(k-1\), the set \(\mathcal{I}_{k-1}\) contains defective sets of sizes within a factor \(\log ^{[k-1]}n>\alpha \) that cannot be distinguished by the estimation algorithm with no tests. However, \(\mathcal{I}_k\) will not contain such sets because \(\alpha >\log ^{[k]}n\). Thus at stage k we partition \(\mathcal{Q}_{k-1}\) to \(r_{k-1}=O({\log ^{[k-1]}}/\log \alpha )\) sets \({\mathcal {Q}}_{k-1}=\cup _{i=1}^{r_1}\mathcal{Q}_{k-1}^{(i)}\), where each \(\mathcal{Q}_{k-1}^{(i)}\) encompasses tests whose sizes fall within the interval \([n'_i, n'_{i+1}]\), where \(n'_{i+1} = \alpha ^2 \cdot n'_i\).

Those changes give the lower bound

$$\dfrac{r_0r_1r_2r_3\cdots r_{k-1}}{\tau ^{\tau /24}}=\dfrac{\log n}{(c\tau )^{\tau /24}\log \alpha }=\Omega \left( \dfrac{\log n}{\log \alpha }\right) .$$

2 Definitions and notation

In this section, we present various definitions and establish the notation used throughout.

We define the set of items as \(X = [n] = \{1, 2, \ldots , n\}\) and the set of defective items as \(I \subseteq X\). The algorithm is given n and has access to a test oracle \({\mathcal {O}}_I\). It can use this oracle to make a test \(Q\subseteq X\), with the oracle responding \({\mathcal {O}}_I(Q):=1\) if \(Q\cap I\not =\emptyset \), and \({\mathcal {O}}_I(Q):=0\) if \(Q\cap I=\emptyset \).

An algorithm A is said to \(\alpha \)-estimate the number of defective items with a probability of at least \(1-\delta \) if, for any set \(I\subseteq X\), A runs in polynomial time in n, use the oracle \({\mathcal {O}}_I\) to make tests, and with probability at least \(1-\delta \), outputs an integer D satisfying \(|I|\le D\le \alpha |I|\). If \(\alpha \) is a constant (independent of n), then we say that the algorithm estimates the number of defective items within a constant factor.

The algorithm is called non-adaptive if the tests do not depend on the outcomes of previous tests, allowing all tests to be conducted simultaneously in a single step. Our goal is to develop a non-adaptive algorithm that makes a minimum number of tests and, with a probability of at least \(1-\delta \), outputs an estimation of the number of defective items within a constant factor.

We define \(\log ^{[k]}n=\log \log {\mathop {\ldots }\limits ^{k}}\log n\) with \(\log ^{[0]}n=n\). It is noted that \(\log ^{[i+1]}n=\log \log ^{[i]}n\) and \(\log ^{[i-1]}n=2^{\log ^{[i]}n}\). Let \(\mathbb {N}=\{0,1,\cdots \}\). For two real numbers \(r_1,r_2\), we denote \([r_1,r_2]=\{r\in \mathbb {N}|r_1\le r\le r_2\}\). Random variables and random sets will be presented in bold.

3 The lower bound constant estimation

In this section, we establish the lower bound for the number of tests required by any non-adaptive randomized algorithm that \(\alpha \)-estimates the number of defective items, for any constant \(\alpha \).

3.1 Lower bound for randomized algorithm

In this section, we prove.

Theorem 1

Let \(\tau =\log ^*n\) and \(\alpha \) be any constant. A non-adaptive randomized algorithm that \(\alpha \)-estimates the number of defective items with a probability of at least 2/3 is required to make at least

$$\Omega \left( \dfrac{\log n}{(480\tau )^{\tau +1}}\right) $$

tests.

We begin by proving the following.

Lemma 1

Let \(n_1=n\). Let \(i\ge 1\) be an integer such that \(\log ^{[i]}n\ge \tau :=\log ^*n\). Suppose there is an integer \(n_i=n^{\Omega (1)}\le n\) and a non-adaptive randomized algorithm \({\mathcal {A}}_i\) that makes

$$\begin{aligned} s_i:=\dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \end{aligned}$$
(1)

tests and for every set of defective items I of size

$$d\in D_i:=\left[ \dfrac{n}{n_i},\dfrac{n(\log ^{[i-1]}n)^{1/4}}{n_i}\right] ,$$

with probability at least \(1-\delta \), \(\alpha \)-estimates d. Then there is an integer \(n_{i+1}=n^{\Omega (1)}\le n\) and a non-adaptive randomized algorithm \({\mathcal {A}}_{i+1}\) that makes

$$\begin{aligned} s_{i+1}:=\dfrac{\log ^{[i+1]} n}{(480\tau )^{\tau -i+1}} \end{aligned}$$
(2)

tests and for every set of defective items I of size

$$d\in D_{i+1}:=\left[ \dfrac{n}{n_{i+1}},\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\right] ,$$

with probability at least \(1-\delta -1/(12\tau )\), \(\alpha \)-estimates d.

Proof

Let

$$N_{i}=\left[ \dfrac{n_i}{(\log ^{[i-1]}n)^{1/4}},n_i\right] .$$

We are interested in all the tests Q made by the algorithm \({\mathcal {A}}_i\) that satisfy \(|Q|\in N_i\). We will now partition \(N_i\) into smaller subsets. Let

$$N_{i,j}=\left[ \dfrac{n_i}{(\log ^{[i]}n)^{4j+4}},\dfrac{n_i}{(\log ^{[i]}n)^{4j}}\right] $$

where \(j=[0,r_i-1]\) and

$$\begin{aligned} r_i=\dfrac{\log ^{[i]}n}{16\log ^{[i+1]}n}. \end{aligned}$$
(3)

Since the lowest endpoint of the interval \(N_{i,r_i-1}\) is

$$\dfrac{n_i}{(\log ^{[i]}n)^{4(r_i-1)+4}}=\dfrac{n_i}{2^{4r_i\log ^{[i+1]}n}}=\dfrac{n_i}{2^{(1/4)\log ^{[i]}n}}=\dfrac{n_i}{(\log ^{[i-1]}n)^{1/4}}$$

and the right endpoint of \(N_{i,0}\) is \(n_i\), we have, \(N_i=\cup _{j=0}^{r_i-1}N_{i,j}\).

Let \({\varvec{\mathcal {Q}}}=\{\varvec{Q}_1,\ldots ,\varvec{Q}_{s_i}\}\) be the random variable tests that the randomized algorithm \({\mathcal {A}}_i\) makes. Let \(\textbf{T}_j\) be a random variable representing the number of tests \(\varvec{Q}\in \varvec{\mathcal {Q}}\) that satisfy \(|\varvec{Q}|\in N_{i,j}\). Since \({\mathcal {A}}_i\) makes \(s_i\) tests, we have \(\textbf{T}_0+\cdots +\textbf{T}_{r_i}\le s_i\). Therefore, by (1) and (3), (in the expectation, \(\textbf{E}_j\), j is uniformly at random over \([0,r_i-1]\), and the other \(\textbf{E}\) is over the random seed of the algorithm \({\mathcal {A}}_i\))

$$\textbf{E}_j\left[ \textbf{E}[\textbf{T}_j]\right] =\textbf{E}\left[ \textbf{E}_j[\textbf{T}_j]\right] \le \dfrac{s_i}{r_i}=\dfrac{16 \log ^{[i+1]}n}{(480\tau )^{\tau -i+2}}.$$

Therefore, there exists \(0\le j_i\le r_i-1\) that depends solely on the algorithm \({\mathcal {A}}_i\) (not the algorithm’s seed) such that

$$\textbf{E}[\textbf{T}_{j_i}]\le \frac{s_i}{r_i}=\dfrac{16 \log ^{[i+1]}n}{(480\tau )^{\tau -i+2}}.$$

By Markov’s bound, with probability at least \(1-16/(480\tau )=1-1/(30\tau )\),

$$\begin{aligned} |\{\textbf{Q}\in \varvec{\mathcal {Q}}:|\textbf{Q}|\in N_{i,j_i}\}|=\textbf{T}_{j_i}\le \frac{(480\tau )s_i}{16r_i}=\dfrac{\log ^{[i+1]}n}{(480\tau )^{\tau -i+1}}=s_{i+1}. \end{aligned}$$
(4)

Define

$$\begin{aligned} n_{i+1}=\dfrac{n_i}{(\log ^{[i]}n)^{4j_i+2}}. \end{aligned}$$
(5)

Since \(n_i=n^{\Omega (1)}\) and

$$(\log ^{[i]}n)^{4j_i+2}\le (\log ^{[i]}n)^{4r_i-2}=\dfrac{(\log ^{[i-1]}n)^{1/4}}{(\log ^{[i]}n)^2},$$

we have that \(n_{i+1}=n^{\Omega (1)}\le n\). Notice that this holds even for \(i=1\). This is because \(n_1=n\) and \((\log ^{[0]}n)^{1/4}=n^{1/4}\) so \(n_2\ge n^{1/4}\log ^2n=n^{\Omega (1)}\).

Consider the following randomized algorithm \({\mathcal {A}}_{i}'\):

Algorithm 2
figure b

Algorithm \(\mathcal{A}'_i\)

Consider the following algorithm \({\mathcal {A}}_{i+1}\):

Algorithm 3
figure c

Algorithm \({\mathcal {A}}_{i+1}\)

We now show that for every set of defective items |I| of size

$$\begin{aligned} d\in D_{i+1}:=\left[ \dfrac{n}{n_{i+1}},\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\right] , \end{aligned}$$
(6)

with probability at least \(1-\delta -1/(12\tau )\), algorithm \({\mathcal {A}}_{i+1}\) \(\alpha \)-estimates d using \(s_{i+1}\) tests.

In algorithm \({\mathcal {A}}_{i+1}\), Step 8 is the only step that makes tests. Therefore, by step 7 the test complexity of \({\mathcal {A}}_{i+1}\) is \(s_{i+1}\).

By the definition of \(D_i\), and since, by (5), \(n/n_{i+1}>n/n_i\), and, by (5) and (3)

$$\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}=\dfrac{n}{n_i}(\log ^{[i]}n)^{4j_i+2.25}\le \dfrac{n}{n_i}(\log ^{[i]}n)^{4r_i-1.75}\le \dfrac{n(\log ^{[i-1]}n)^{1/4}}{n_i},$$

we can conclude that \(D_{i+1}\subset D_i\).

Consider the following events:

  1. 1.

    \(\underline{Event} M_0\): For some \(\varvec{Q}'=\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}'\) such that

    $$|\varvec{Q}'|\le \frac{n_i}{(\log ^{[i]}n)^{4j_i+4}}$$

    (i.e., \(\varvec{Q}'\in \varvec{\mathcal {Q}}_0\)), \(\varvec{Q}'\cap I\not =\emptyset \) (i.e., the answer to the test \(\varvec{Q}\) is 1).

  2. 2.

    \(\underline{Event} M_1\): For some \(\varvec{Q}'=\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}'\) such that

    $$|\varvec{Q}'|\ge \frac{n_i}{(\log ^{[i]}n)^{4j_i}}$$

    (i.e., \(\varvec{Q}'\in \varvec{\mathcal {Q}}_1\)) \(\varvec{Q}'\cap I=\emptyset \) (i.e., the answer to the test \(\varvec{Q}\) is 0).

  3. 3.

    \(\underline{Event} W\):

    $$|\varvec{\mathcal {Q}}''|>s_{i+1}=\dfrac{\log ^{[i+1]}n}{(480\tau )^{\tau -i+1}}.$$

The success probability of the algorithm \({\mathcal {A}}_{i+1}\) on a set I of defective items with \(|I|=d\in D_{i+1}\) is (here the probability is over \(\varvec{\phi }\) and the random tests \(\varvec{\mathcal {Q}}\))

$$\begin{aligned} \textbf{Pr}[{\mathcal {A}}_{i+1}\text { succeeds on }I]&= \textbf{Pr}[({\mathcal {A}}_{i}'\text { succeeds on }I)\wedge \overline{M_0}\wedge \overline{M_1}\wedge \overline{W}]\\&\ge \textbf{Pr}[{\mathcal {A}}_i'\text { succeeds on }I]-\textbf{Pr}[M_0\vee M_1\vee W]\\&\ge \textbf{Pr}[{\mathcal {A}}_i'\text { succeeds on }I]-\textbf{Pr}[M_0]-\textbf{Pr}[M_1]-\textbf{Pr}[W]. \end{aligned}$$

Now, since \(|\varvec{\phi }^{-1}(I)|=|I|\) and \(\varvec{Q}'\cap I=\varvec{\phi }(\varvec{Q})\cap I\not =\emptyset \) if and only if \(\varvec{Q}\cap \varvec{\phi }^{-1}(I)\not =\emptyset \),

$$ \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[{\mathcal {A}}_i' \text{ succeeds } \text{ on } I]=\underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[{\mathcal {A}}_i \text{ succeeds } \text{ on } \phi ^{-1}(I)]\ge 1-\delta . $$

Therefore, to get the result, it is enough to show that \(\textbf{Pr}[M_0]\le 1/(300\tau ), \) \(\textbf{Pr}[M_1]\le 1/(300\tau )\) and \(\textbf{Pr}[W]\le 1/(30\tau )\).

First, since \(|\varvec{Q}'|=|\varvec{\phi }(\varvec{Q})|=|\varvec{Q}|\) we have

$$|\varvec{\mathcal {Q}}''|=|\{\varvec{Q}_i':|\varvec{Q}_i'|\in N_{i,j_i}\}|=|\{\varvec{Q}_i:|\varvec{Q}_i|\in N_{i,j_i}\}|=\textbf{T}_{j_i}.$$

By (4), with probability at most \(1/(30\tau )\),

$$|\varvec{\mathcal {Q}}''|=\textbf{T}_{j_i}>\dfrac{\log ^{[i+1]}n}{(480\tau )^{\tau -i+1}}.$$

Therefore, \(\textbf{Pr}[W]\le 1/(30\tau )\).

We now will show that \(\textbf{Pr}[M_0]\le 1/(300\tau )\). We have, (A detailed explanation of every step can be found below.)

$$\begin{aligned} \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[M_0]= & \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[(\exists \varvec{Q}\in \varvec{\mathcal {Q}},\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}_0) \ \varvec{\phi }(\varvec{Q})\cap I\not =\emptyset ]\end{aligned}$$
(7)
$$\begin{aligned}= & \underset{\ \varvec{\phi },\varvec{\mathcal {Q}}}{\textbf{Pr}}[(\exists \varvec{Q}\in \varvec{\mathcal {Q}},\varvec{\phi }(\varvec{Q})\in \varvec{\mathcal {Q}}_0) \ \varvec{Q}\cap \varvec{\phi }^{-1}(I)\not =\emptyset ]\end{aligned}$$
(8)
$$\begin{aligned}\le & s_i \left( 1-\prod _{k=0}^{d-1}\left( 1-\dfrac{n_i}{(\log ^{[i]}n)^{4j_i+4}(n-k)}\right) \right) \end{aligned}$$
(9)
$$\begin{aligned}\le & s_i \left( 1-\left( 1-\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n}\right) ^d\right) \end{aligned}$$
(10)
$$\begin{aligned}\le & s_i d\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n}\end{aligned}$$
(11)
$$\begin{aligned}\le & \dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \cdot \dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n} \end{aligned}$$
(12)
$$\begin{aligned}= & \dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \cdot \dfrac{n(\log ^{[i]}n)^{4j_i+2\frac{1}{4}}}{n_{i}}\dfrac{2n_i}{(\log ^{[i]}n)^{4j_i+4}n} \end{aligned}$$
(13)
$$\begin{aligned}= & \dfrac{2}{(480\tau )^{\tau -i+2}(\log ^{[i]}n)^{3/4}} \le \dfrac{1}{300\tau }. \end{aligned}$$
(14)

(7) is derived from the definition of the event \(M_0\). (8) follows from the observation that, for any permutation \(\phi :[n]\rightarrow [n]\) and two subsets \(X,Y\subseteq [n]\), the condition \(\phi (X)\cap Y\not =\emptyset \) is equivalent to \(X\cap \Phi ^{-1}(Y)\not =\emptyset \). (9) follows from:

  1. 1.

    The application of the union bound and the fact that \(|\varvec{\mathcal {Q}}_0|\le |\varvec{\mathcal {Q}}|= s_i\).

  2. 2.

    For a uniformly random \(\varvec{\phi }\), and a d-subset of \(I\subset [n]\), \(\varvec{\phi }^{-1}(I)\) is a uniformly random d-subset of [n].

  3. 3.

    For every \(\varvec{Q}\in \varvec{\mathcal {Q}}\) for which \(\phi (\varvec{Q})\in \varvec{\mathcal {Q}}_0\), we have

    $$|\varvec{Q}|=|\phi (\varvec{Q})|\le {n_i}/{(\log ^{[i]}n)^{4j_i+4}}.$$

(10) follows from the fact that since \(d\in D_{i+1}\) and \(n_{i+1}=n^{\Omega (1)}\), by (6), it follows that \(d\le n/2\). (11) follows from the inequality \((1-x)^d\ge 1-dx\). (12) follows from (1) and (6). (13) follows from (5). Finally, (14) follows from the fact that since \(\log ^{[i]}n\ge \tau =\log ^*n\), it holds that \(i\le \tau \), and therefore, \((480\tau )^{\tau -i+2}\ge (480\tau )^{2}\ge 600\tau \).

We now demonstrate that \(\textbf{Pr}[M_1] \le \frac{1}{300\tau }\).

$$\begin{aligned} \textbf{Pr}_{\varvec{\phi },\varvec{\mathcal {Q}}}[M_1]= & \textbf{Pr}_{\varvec{\phi },\varvec{\mathcal {Q}}}\left[ \left( \exists \varvec{Q} \in \varvec{\mathcal {Q}} \text { s.t. } \varvec{\phi }(\varvec{Q}) \in \varvec{\mathcal {Q}}_1\right) \wedge \left( \varvec{\phi }(\varvec{Q}) \cap I = \emptyset \right) \right] \end{aligned}$$
(15)
$$\begin{aligned}= & \textbf{Pr}_{\varvec{\phi },\varvec{\mathcal {Q}}}\left[ \left( \exists \varvec{Q} \in \varvec{\mathcal {Q}} \text { s.t. } \varvec{\phi }(\varvec{Q}) \in \varvec{\mathcal {Q}}_1\right) \wedge \left( \varvec{Q} \cap \varvec{\phi }^{-1}(I) = \emptyset \right) \right] \end{aligned}$$
(16)
$$\begin{aligned}\le & s_i \prod _{k=0}^{d-1}\left( 1 - \frac{n_i}{(\log ^{[i]} n)^{4j_i}(n-k)}\right) \end{aligned}$$
(17)
$$\begin{aligned}\le & s_i \left( 1 - \frac{n_i}{(\log ^{[i]} n)^{4j_i} n}\right) ^d\nonumber \\\le & s_i \exp \left( -\frac{dn_i}{(\log ^{[i]} n)^{4j_i} n}\right) \end{aligned}$$
(18)
$$\begin{aligned}\le & \frac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \exp \left( -\frac{\frac{n}{n_{i+1}} \frac{n_i}{(\log ^{[i]} n)^{4j_i}}}{n}\right) \end{aligned}$$
(19)
$$\begin{aligned}\le & \frac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}} \exp \left( -(\log ^{[i]} n)^2\right) \end{aligned}$$
(20)
$$\begin{aligned}\le & \frac{1}{300\tau }. \end{aligned}$$
(21)

(15) is derived from the definition of the event \(M_1\). (16) follows from the observation that for any permutation \(\phi :[n]\rightarrow [n]\) and two subsets \(X,Y\subseteq [n]\), the condition \(\phi (X)\cap Y = \emptyset \) is equivalent to \(X\cap \phi ^{-1}(Y) = \emptyset \). In (17), we again apply the union bound, \(|\varvec{\mathcal {Q}}_1| \le |\varvec{\mathcal {Q}}| = s_i\), the fact that \(\phi ^{-1}(I)\) is a random uniform \(d\)-subset, and for \(\varvec{Q}' \in \varvec{\mathcal {Q}}_1\), \(|\varvec{Q}'| \ge {n_i}/{(\log ^{[i]} n)^{4j_i}}\). (18) follows from the inequalities \(\left( 1 - {y}/{(n-k)}\right) \le \left( 1 - {y}/{n}\right) \) and \(1-x \le e^{-x}\) for all \(x\) and \(y \ge 0\). (19) follows from (1) and (6). (20) is based on (5). (21) follows from the fact that \(i \le \tau \), implying that \((480\tau )^{\tau -i+2} \ge (480\tau )^{2} \ge 300\tau \).

This completes the proof. \(\square \)

We are now ready to prove Theorem 1.

Proof

Assume, to the contrary, that there exists a non-adaptive randomized algorithm \({\mathcal {A}}_1\) which, with a probability of at least \(2/3\), \(\alpha \)-estimates the number of defective items while making

$$ m:= \frac{\log n}{(480\tau )^{\tau +1}} $$

tests. Note that \(n_1 = n\) and \(\log ^{[0]} n = n\). We will apply Lemma 1 with \(\delta = 1/3\), \(D_1 = [1, n^{1/4}]\), and \(s_1 = m\).

Let \(\ell \) be an integer such that \(\log \log ^* n < \log ^{[\ell ]} n \le \log ^* n = \tau \). Consequently, we have \(\log ^{[\ell -1]} n> 2^{\log ^{[\ell ]} n} > 2^{\log \log ^* n} = \tau \). We can now utilize Lemma 1 with \(i = \ell - 1\) to derive

$$ s_\ell = \frac{\log ^{[\ell ]} n}{(480\tau )^{\tau - \ell + 2}} \le \frac{\tau }{(480\tau )^2} < 1. $$

Consequently, the algorithm \({\mathcal {A}}_\ell \) does not perform any tests and achieves an \(\alpha \)-estimation of the size of the defective items \(I\) with a probability of at least \(2/3 - \frac{\ell }{12\tau } \ge \frac{7}{12} > \frac{1}{2}\), given that

$$ |I| \in D_\ell = \left[ \frac{n}{n_\ell }, \frac{n(\log ^{[\ell -1]} n)^{1/4}}{n_\ell }\right] . $$

Specifically, with a probability exceeding \({1}/{2}\), we are able to differentiate between defective sets of size \({n}/{n_\ell }\) and those larger than \(\alpha {n}/{n_\ell }\) without conducting any tests, which is not feasible.

This results in a contradiction. \(\square \)

4 The lower bound for \(\alpha \)-estimation

In this section, we sketch the proof of the following tight lower bound.

Theorem 2

Let \(\alpha _n\) be any function in n such that, there exists a constant k (independent of n) where \(\log ^{[k-1]}n>\alpha \ge \log ^{[k]}n\). Any non-adaptive randomized algorithm that \(\alpha \)-estimates the number of defective items with a probability of at least 2/3 makes at least

$$\Omega \left( \dfrac{\log n}{\log \alpha _n}\right) $$

tests.

We now give the sketch of the proof.

First, it is sufficient to prove the result for

$$\begin{aligned} (\log ^{[k-1]}n)^{1/c}>\alpha \ge \log ^{[k]}n \end{aligned}$$
(22)

for a sufficiently large c and any constant k. This is because any \(\alpha \)-estimation where \(\log ^{[k-1]}n\ge \alpha \ge (\log ^{[k-1]}n)^{1/c}\) can also serve as a \((\log ^{[k-1]}n)\)-estimation.

Let \(\alpha =\alpha _n\) and \(\tau =2k\). Notice that \(\tau =O(1)\). Assume that there is an algorithm that makes

$$m:=\frac{\log n}{(480\tau )^{\tau +1}\log \alpha }$$

tests and with probability at least 2/3 returns an \(\alpha \)-estimation of the number of defective sets.

We first use the following Lemma, the proof of which is identical to the proof of Lemma 1.

Lemma 2

Let \(n_1=n\). Let \(i\ge 1\) be an integer such that \(\log ^{[i+1]}n\ge \alpha \). Suppose there is an integer \(n_i=n^{\Omega (1)}\le n\) and a non-adaptive randomized algorithm \({\mathcal {A}}_i\) that makes

$$\begin{aligned} s_i:=\dfrac{\log ^{[i]} n}{(480\tau )^{\tau -i+2}\log \alpha } \end{aligned}$$
(23)

tests and for every set of defective items I of size

$$d\in D_i:=\left[ \dfrac{n}{n_i},\dfrac{n(\log ^{[i-1]}n)^{1/4}}{n_i}\right] ,$$

with probability at least \(1-\delta \), \(\alpha \)-estimates d. Then there is an integer \(n_{i+1}=n^{\Omega (1)}\le n\) and a non-adaptive randomized algorithm \({\mathcal {A}}_{i+1}\) that makes

$$\begin{aligned} s_{i+1}:=\dfrac{\log ^{[i+1]} n}{(480\tau )^{\tau -i+1}\log \alpha } \end{aligned}$$
(24)

tests and for every set of defective items I of size

$$d\in D_{i+1}:=\left[ \dfrac{n}{n_{i+1}},\dfrac{n(\log ^{[i]}n)^{1/4}}{n_{i+1}}\right] ,$$

with probability at least \(1-\delta -1/(12\tau )\), \(\alpha \)-estimates d.

We continue to use the above lemma starting from \(s_1=m\) with \(i=1\) up to \(i=k-1\), to derive an algorithm \(\mathcal{A}_k\). This algorithm makes

$$s_k:=\dfrac{\log ^{[k]} n}{(480\tau )^{\tau -k+2}\log \alpha }$$

tests and for every set of defective set of size

$$d\in D_k:=\left[ \dfrac{n}{n_k},\dfrac{n(\log ^{[k-1]}n)^{1/4}}{n_k}\right] ,$$

with probability \(2/3-(k-1)/(12\tau )\ge 5/8\), \(\alpha \)-estimates d. We note here that the constraint made in (22) ensures \(s_k > 1\), which is a requirement for our analysis.

We then prove the following Lemma, whose proof, sketched below, is similar to the proof of Lemma 1.

Lemma 3

Let \(n_1=n\). Suppose there is an integer \(n_k=n^{\Omega (1)}\le n\) and a non-adaptive randomized algorithm \({\mathcal {A}}_k\) that makes

$$\begin{aligned} s_k:=\dfrac{\log ^{[k]} n}{(480\tau )^{\tau -k+2}\log \alpha } \end{aligned}$$
(25)

tests and for every set of defective items I of size

$$d\in D_k:=\left[ \dfrac{n}{n_k},\dfrac{n(\log ^{[k-1]}n)^{1/4}}{n_k}\right] ,$$

with probability at least 5/8, \(\alpha \)-estimates d. Then there is an integer \(n_{k+1}=n^{\Omega (1)}\le n\) and a non-adaptive randomized algorithm \({\mathcal {A}}_{k+1}\) that makes

$$\begin{aligned} s_{k+1}:=\dfrac{4}{(480\tau )^{\tau -k+1}} \end{aligned}$$
(26)

tests and for every set of defective items I of size

$$d\in D_{k+1}:=\left[ \dfrac{n}{n_{k+1}},\dfrac{n\alpha ^4}{n_{k+1}}\right] ,$$

with probability at least \(5/8-1/(12\tau )\ge 9/16\), \(\alpha \)-estimates d.

Before we sketch the proof, we show that this Lemma completes the proof of the Theorem.

Recall that \(\tau =2k\). Notice that \(s_{k+1}<1\) and, therefore, algorithm \(\mathcal{A}_{k+1}\) makes no test and with probability \(9/16>1/2\) can distinguish between a defective set of size \((n/n_{k+1})\in D_{k+1}\) and one of size \(\alpha ^2(n/n_{k+1})\in D_{k+1}\), which leads to a contradiction. Therefore, any non-adaptive randomized algorithm that \(\alpha \)-estimates the number of defective items with a probability of at least 2/3 makes at least

$$m=\Omega \left( \frac{\log n}{(480\tau )^{\tau +1}\log \alpha }\right) =\Omega \left( \frac{\log n}{\log \alpha }\right) $$

tests.

To prove Lemma 3, we use the same proof as for Lemma 1 with the following changes.

We have

$$N_{k}=\left[ \dfrac{n_k}{(\log ^{[k-1]}n)^{1/4}},n_k\right] $$

as in Lemma 1 but the partition will be into the following sets

$$N_{i,j}=\left[ \frac{n_k}{\alpha ^{16j+16}},\frac{n_k}{\alpha ^{16j}}\right] $$

where \(j\in [0,r_k-1]\) and (see [3])

$$r_k=\frac{\log ^{[k]}n}{64\log \alpha }.$$

Then, as in the proof of Lemma 1, there exists a \(j_k\) such that, with a probability at least \(1-1/(30\tau )\), the number of tests of sizes in \(N_{k,j_k}\) is at most (see [4])

$$s_{k+1}=\frac{(480\tau )s_k}{16r_k}=\dfrac{4}{(480\tau )^{\tau -k+1}}.$$

We now define (see [5])

$$n_{k+1}=\frac{n_k}{\alpha ^{16j_k+8}}.$$

Then \(\varvec{\mathcal {Q}}_0\) and \(\varvec{\mathcal {Q}}_1\) in Algorithm 3 are defined as follows:

$$\varvec{\mathcal {Q}}_0:=\left\{ \varvec{Q}'\in \varvec{\mathcal {Q}}':|\varvec{Q}'|\le \frac{n_k}{\alpha ^{16j_k+16}}\right\} ,$$

and

$$\varvec{\mathcal {Q}}_1:=\left\{ \varvec{Q}'\in \varvec{\mathcal {Q}}':|\varvec{Q}'|\ge \frac{n_k}{\alpha ^{16j_k}}\right\} .$$

Now it is straightforward to verify (7)-(14) and (15)-(21), which completes the proof of the Lemma.

5 Conclusion

In this paper, we introduce a novel methodology for deriving lower bounds in non-adaptive randomized group testing. Our approach facilitates the establishment of a lower bound of

$$ \Omega \left( \frac{\log n}{(c \log ^* n)^{\log ^* n}}\right) $$

for some constant \(c\), concerning the test complexity of any randomized non-adaptive algorithm estimating the number of defective items within a constant factor. Furthermore, we demonstrate a tight lower bound of

$$ \Omega \left( \frac{\log n}{\log \alpha }\right) $$

for \(\alpha \)-estimation when \(\alpha > \log ^{[j]} n\) for a given constant \(j\). These two lower bounds represent a significant improvement over the prior bound of

$$ \Omega \left( \frac{\log n}{\log \log n}\right) $$

established in Bshouty (2019); Ron and Tsur (2016).

The key innovation in our work was the incorporation of a random permutation \(\phi \) in Algorithm 1, which enabled us to achieve independence among events and allowed for a repeat analysis of the set \({\mathcal {Q}}^{(j)}\). This pivotal step helped us to navigate around the limitations of earlier techniques and attain a more favorable lower bound.

An intriguing open question remains: establishing a lower bound of \(\Omega (\log n)\) for the test complexity in non-adaptive randomized algorithms that estimate the number of defective items within a constant factor.