Abstract
The group testing problem asks to find d≪n defective elements out of n elements, by testing subsets (pools) for the presence of defectives. In the strict model of group testing, the goal is to identify all defectives if at most d defectives exist, and otherwise to report that more than d defectives are present. If tests are time-consuming, they should be performed in a small constant number s of stages of parallel tests. It is known that a test number O(dlogn), which is optimal up to a constant factor, can be achieved already in s=2 stages. Here we study two aspects of group testing that have not found major attention before. (1) Asymptotic bounds on the test number do not yet lead to optimal strategies for specific n,d,s. Especially for small n we show that randomized strategies significantly save tests on average, compared to worst-case deterministic results. Moreover, the only type of randomness needed is a random permutation of the elements. We solve the problem of constructing optimal randomized strategies for strict group testing completely for the case when d=1 and s≤2. A byproduct of our analysis is that optimal deterministic strategies for strict group testing for d=1 need at most 2 stages. (2) Usually, an element may participate in several pools within a stage. However, when the elements are indivisible objects, every element can belong to at most one pool at the same time. For group testing with disjoint simultaneous pools we show that Θ(sd(n/d)1/s) tests are sufficient and necessary. While the strategy is simple, the challenge is to derive tight lower bounds for different s and different ranges of d versus n.
Similar content being viewed by others
Notes
Note that this strategy is deterministic up to random permutation, however we do not need optimality as in Theorem 2.
References
Aigner, M.: Combinatorial Search. Wiley-Teubner, New York (1988)
Balding, D.J., Torney, D.C.: Optimal pooling designs with error detection. J. Comb. Theory, Ser. A 74, 131–140 (1996)
Bar-Lev, S.K., Boneh, A., Perry, D.: Incomplete identification models for group-testable items. Nav. Res. Logist. 37, 647–659 (1990)
Cheng, Y., Du, D.Z.: New constructions of one- and two-stage pooling designs. J. Comput. Biol. 15, 195–205 (2008)
Cicalese, F., Damaschke, P., Vaccaro, U.: Optimal group testing strategies with interval queries and their application to splice site detection. Int. J. Bioinform. Res. Appl. 1, 363–388 (2005)
Cicalese, F., Damaschke, P., Tansini, L., Werth, S.: Overlaps help: improved bounds for group testing with interval queries. Discrete Appl. Math. 155, 288–299 (2007)
Clements, G.F.: The minimal number of basic elements in a multiset antichain. J. Comb. Theory, Ser. A 25, 153–162 (1978)
Colbourn, C.J., Dinitz, J.H.: The CRC Handbook of Combinatorial Designs. CRC Press, Boca Raton (1996)
Dachman-Soled, D., Servedio, R.: A canonical form for testing Boolean function properties. In: Goldberg, L.A., Jansen, K., Ravi, R., Rolim, J.D.P. (eds.) Approximation, Randomization, and Combinatorial Optimization, Algorithms and Techniques APPROX-RANDOM 2011. LNCS, vol. 6845, pp. 460–471. Springer, Heidelberg (2011)
Damaschke, P., Sheikh Muhammad, A.: Randomized group testing both query-optimal and minimal adaptive. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) 38th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2012. LNCS, vol. 7147, pp. 214–225. Springer, Heidelberg (2012)
De Bonis, A., Di Crescenco, G.: Combinatorial group testing for corruption localizing hashing. In: Fu, B., Du, D.Z. (eds.) Computing and Combinatorics, COCOON 2011. LNCS, vol. 6842, pp. 579–591. Springer, Heidelberg (2011)
De Bonis, A., Gasieniec, L., Vaccaro, U.: Optimal two-stage algorithms for group testing problems. SIAM J. Comput. 34, 1253–1270 (2005)
Du, D.Z., Hwang, F.K.: Combinatorial Group Testing and Its Applications. Series on Appl. Math., vol. 18. World Scientific, Singapore (2000)
Du, D.Z., Hwang, F.K.: Pooling Designs and Nonadaptive Group Testing. Series on Appl. Math., vol. 18. World Scientific, Singapore (2006)
Dyachkov, A.G., Rykov, V.V.: Bounds on the length of disjunctive codes. Probl. Inf. Transm. 18, 7–13 (1982) (in Russian)
Edmonds, J.: Matroids and the greedy algorithm. Math. Program. 1, 127–136 (1971)
Eppstein, D., Goodrich, M.T., Hirschberg, D.S.: Improved combinatorial group testing algorithms for real-world problem sizes. SIAM J. Comput. 36, 1360–1375 (2007)
Fang, J., Jiang, Z.L., Yiu, S.M., Hui, L.C.K.: An efficient scheme for hard disk integrity check in digital forensics by hashing with combinatorial group testing. Int. J. Digit. Content Technol. Appl. 5, 300–308 (2011)
Fischer, P., Klasner, N., Wegener, I.: On the cut-off point for combinatorial group testing. Discrete Appl. Math. 91, 83–92 (1999)
Goldreich, O., Trevisan, L.: Three theorems regarding testing graph properties. Random Struct. Algorithms 23, 23–57 (2003)
Goodrich, M.T., Hirschberg, D.S.: Improved adaptive group testing algorithms with applications to multiple access channels and dead sensor diagnosis. J. Comb. Optim. 15, 95–121 (2008)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952)
Huang, S.H., Hwang, F.K.: When is individual testing optimal for nonadaptive group testing? SIAM J. Discrete Math. 14, 540–548 (2001)
Li, C.H.: A sequential method for screening experimental variables. J. Am. Stat. Assoc. 57, 455–477 (1962)
Lubell, D.: A short proof of Sperner’s lemma. J. Comb. Theory 1, 299 (1966)
Mézard, M., Toninelli, C.: Group testing with random pools: optimal two-stage algorithms. IEEE Trans. Inf. Theory 57, 1736–1745 (2011)
Ruszinkó, M.: On the upper bound of the size of the r-cover-free families. J. Comb. Theory, Ser. A 66, 302–310 (1994)
Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math. Z. 27, 544–548 (1928) (in German)
Schlaghoff, J., Triesch, E.: Improved results for competitive group testing. Comb. Probab. Comput. 14, 191–202 (2005)
Welsh, D.J.A.: Matroid Theory. Academic Press, San Diego (1976)
Acknowledgements
The work of the first two authors has been supported by the Swedish Research Council (Vetenskapsrådet), through grant 2010-4661, “Generalized and fast search strategies for parameterized problems”. The first author also received support from RWTH Aachen during a visit in 2011. We thank the referees of ICALP2001GT and of this Special Issue for their careful remarks and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of the paper has been presented by the first author at the ICALP Workshop on Algorithms and Data Structures for Selection, Identification and Encoding ICALP2011GT, Zürich.
Appendix A
Appendix A
1.1 A.1 Interpretation of Group Testing as a Game
Here we give a more formal presentation of the game-theoretic setting we use for proving lower bounds. In the group testing problem we are searching for some unknown subset P of an n-element set I of elements, and we can choose pools Q⊂I in order to test whether Q∩P is empty or not.
A deterministic s-stage group testing strategy A is a function which successively selects sets \(T_{i}:=\{Q_{i,1},\ldots,Q_{i,t_{i}}\}\) of pools, 1≤i≤s. Given some subset P⊂I, the result of test stage i is given by the vector \(r_{i}:=(r_{i,1},\ldots,r_{i,t_{i}})\), where r i,j :=0 if Q i.1∩P=∅, and r i,j :=1 else. The strategy may use the results of stages 1,…,i−1 when selecting T i , hence T i may depend on P if i>1. The cost of stage i is the number of tests, t i =t i (P).
A strategy A solves the strict (d,n) group testing problem if, for each P⊂I, |P|≤d, the sequence of results r i , 1≤i≤s, uniquely determines P as a subset of I. (In particular, if the strategy is applied to a set P of cardinality larger than d, the test results show that |P|>d).
We denote by \(\mathcal{A}\) the finite set of all deterministic s-stage strategies solving the strict (d,n) group testing problem. Moreover, we denote by \(t(A,P):=\sum_{i=1}^{s} t_{i}(P)\) the cost of A, given P, and by t(A):=max{t(A,P):P⊂I,|P|≤d}, the worst-case cost of strategy A. Now, \(t(n,d,s):=\min_{A\in {\mathcal{A}}}t(A)\).
An adversary strategy S is a function which, given a sequence r j , 1≤j≤i−1, of possible results for an s-stage group testing algorithm and some test set T i of pools, chooses some vector r i =S(r 1,…,r i−1,T i ) of results for the tests in T i . In what follows, we are considering adversary strategies generating results which are consistent with at least one subset P⊂I and denote the (finite) set of all such strategies by \(\mathcal{S}\).
Given some adversary strategy \(S\in \mathcal{S}\) and an s-stage group testing strategy \(A\in \mathcal{A}\), we define t(A,S) as the total number of tests used if A is applied to the results generated by S, while S is applied to the test sets generated by A.
The number t(A,S) can be interpreted as the total number of tests used in a game between two players, the searcher and the hider. The searcher wants to find P, and the tests she uses are given by algorithm A. The hider provides results for the tests by using her strategy S. The number \(t(S):=\min_{\mathcal{A}}t(A,S)\) denotes the worst-case cost that the hider can enforce by using strategy S.
Clearly, we have \(t(A)=\max_{S\in \mathcal{S}}t(A,S)\), hence \(\min_{\mathcal{A}}t(A)\geq \max_{\mathcal{S}}t(S)\). In fact, it is well known that equality holds:

This can be proved by induction on s by slightly modifying the proof of Proposition 1.31 in [1] and is a special case of the Minimax theorem of Game Theory. Hence t(S) is a lower bound for t(n,d,s) for each adversary strategy S.
In our arguments, we also use a variant of the strategies discussed above: Imagine that, in stage i, the pool set T i is tested. Instead of providing results for the tests Q i,j only, the hider might be allowed to reveal more information for free, that is, add positive or negative test sets Q i,j to T i , which are not counted as tests in computing t(A,S). (Of course, the answers must remain consistent with some set P). For a strategy S in this generalized sense, the inequality t(S)≤t(n,d,s) holds a fortiori.
A randomized strategy is now given by a probability distribution π on \(\mathcal{A}\). Its expected costs are defined as \(E[\pi]:=\sum_{A\in\mathcal{A}}\pi (A)t(A)\). With similar definitions for randomized adversary strategies τ as probability distributions on \(\mathcal{S}\), the Minimax theorem of Game Theory yields:

1.2 A.2 Extremal Matchings in Balanced Weighted Bipartite Graphs
In this section we prove the Claim from Sect. 3.4, provided that m≥2k−1 (equivalently, d≤(q+3)/2). It was already stated and motivated there, such that we can now focus on the proof. We presume familiarity with the notion of a matroid, otherwise we refer to [30].
The transversal matroid of a bipartite graph B=(U,V;E) has as independent sets exactly those sets S⊆V for which there exists a matching that covers S. It is well known that, in any matroid, an independent set, also called a basis, of maximum weight can be computed by a greedy algorithm due to [16]. It works as follows in our case of a transversal matroid, for a given B=(U,V;E) and π: Sort the vertices in V by non-ascending weights π(v), and start with S:=∅. Scan this sorted list and always put v into the basis S if and only if the resulting set S:=S∪{v} remains independent, otherwise let S unchanged. (The greedy algorithm has to call an “oracle” that checks whether some matching covers S.) We denote by g(π) the weight π(S) of a maximum-weight basis S.
We suppose in the following that B possesses a matching of size k, that is, a matching that covers U. Then we have |S|=k, since in a matroid all maximal independent sets have the same cardinality.
Let Δ denote the set of all distributions π on V. Since Δ is compact and g, with argument π∈Δ, is a continuous function on Δ, the minimum min π∈Δ g(π) exists. Denote by Δ∗ the set of all distributions π ∗∈Δ where g(π ∗) attains this minimum. Δ∗ is compact as well.
In the following we adopt the convention xlog2 x=0 for x=0. The entropy function h(π):=−∑ v∈V π(v)log2 π(v) is continuous, in particular on Δ∗. Hence some distribution σ∈Δ∗ maximizes the entropy. We claim that g(σ)≥k/m. Of course, the Claim then follows for all distributions from Δ∗ and hence from Δ, and then the k/m lower bound is proved for any graph B as specified above. We just need the entropy as an auxiliary measure to prove this Claim. A basic property of the entropy h(π) is that it increases if we replace two items in π, say x and y such that x<y, with x+ϵ<y−ϵ, where ϵ>0.
When applying the greedy algorithm to σ, we may assume that the vertices of V are sorted in such a way that, for every existing value a of vertex weights σ(v), those vertices v from {v∈V: σ(v)=a} that enter the basis S appear before the vertices of equal weight that do not enter the basis. Thus we obtain the following structure. We can split S into S=S 1∪S 2∪⋯∪S r , such that all vertices in each S i have the same weight, and the weights in each S i are strictly larger than the weights in S i+1. Furthermore we can split L:=V∖S into L=L 1∪L 2∪⋯∪L r , such that S 1,L 1,S 2,L 2,…,S r ,L r is a partition of the whole V sorted by non-ascending weights, into consecutive subsets, and the weights in each L i are strictly larger than the weights in S i+1. Note that some L i may be empty.
We also fix some matching that covers S (which exists since S is a basis) and divide U into U=W 1∪W 2∪⋯∪W r , where W i comprises the matching partners of vertices in S i .
Lemma 21
The vertices in each S j ∪L j have all their neighbors in W 1∪⋯∪W j .
Proof
Any edge incident to a vertex in v∈L j has its second vertex in some W i with i≤j, since otherwise the greedy algorithm would have added v to the basis. The main work is to prove the lemma also for S j .
Assume that an edge uv exists with u∈W i , v∈S j , and i>j. Let w∈S i be the matching partner of u. We increase f(uw) and decrease f(uv) by some ϵ>0; we will specify below how small it has to be. Let σ′ be the resulting distribution on V. Clearly, the change does not affect the sum of weights of the edges incident with u∈U, and we get σ′(w)=σ(w)+ϵ and σ′(v)=σ(v)−ϵ. Moreover, σ′ has a higher entropy than σ.
Since all σ weights in S i were equal, we can assume w to be the first vertex in S i , and since the σ weights in S i−1∪L i−1 were strictly larger, the position of w in the order of σ′ weights remains the same as in σ, for a small enough ϵ. Similarly we can assume that v is the last vertex in S j . If v keeps its position as well, then the greedy algorithm returns the same basis containing both v and w, hence g(σ′)=g(σ). Now assume that g(σ′)>g(σ). Then there must be some v′∈L j with σ(v′)=σ(v) that replaces v in the greedy basis (otherwise the greedy basis cannot change any more). That means, S 1∪⋯∪S j ∪{v′}∖{v} is an independent set in the transversal matroid of B. Accordingly, let M′ be a matching that covers S 1∪⋯∪S j ∪{v′}∖{v}. Let M be a matching that covers S 1∪⋯∪S j and maps this set to U=W 1∪⋯∪W j . Note that M exists, and u does not belong to any edge in M.
Consider the alternating path A in M′∪M that starts in v′ with an edge of M′. If A ends in U, then we can use the M′-edges in A to cover v′ and those vertices of S 1∪⋯∪S j that appear in A. The other vertices of S 1∪⋯∪S j are still covered by their M-edges. If A ends in V, then the last vertex of A is necessarily v, since otherwise we could continue A with another M′-edge. But now we can append the edge vu to A instead, since u is not involved in any M-edge. We are back to the previous situation and can cover v′ and those vertices of S 1∪⋯∪S j that appear in A, using M′-edges and the extra edge vu. Finally, A cannot be a cycle, i.e., return to v′, since v′ is not incident with any M-edge. In either case we found a matching that covers S 1∪⋯∪S j ∪{v′}. But this contradicts the fact that the greedy algorithm applied to the original σ weights did not put v′ in the basis.
This contradiction disproves the assumption g(σ′)>g(σ). Hence we always obtain g(σ′)=g(σ), since g(σ) was minimal. Thus we have σ′∈Δ∗, contradicting the choice of σ as the distribution with maximum entropy in Δ∗. This contradiction, in turn, shows that the assumed edge uv with u∈W i , v∈S j , and i>j cannot exist, which completes the proof. □
Define k i :=|S i |, and let σ(X) denote the weight of any subset X⊆V. Consider any prefix S 1,L 1,…,S i ,L i of our sequence S 1,L 1,…,S r ,L r . The edges incident to the k 1+⋯+k i vertices in W 1∪⋯∪W i have together the weight (k 1+⋯+k i )/k, and the vertices in S 1∪L 1∪⋯∪S i ∪L i get their weights from such edges only, due to Lemma 21. It follows σ(S 1∪L 1∪⋯∪S i ∪L i )≤(k 1+⋯+k i )/k. Recall that k=|S|, m=|V|, and we want to prove σ(S)≥k/m.
At this point we can forget about the graph and consider S 1,L 1,…,S r ,L r as a sequence of sets of weighted items which enjoys the following properties:
-
(i)
All weights in each S i are equal.
-
(ii)
All weights in S i are at least as large as all weights in L i .
-
(iii)
All weights in S i are at least as large as all weights in S i+1.
-
(iv)
All prefixes satisfy σ(S 1∪L 1∪⋯∪S i ∪L i )/σ(V)≤(k 1+⋯+k i )/k.
Our claim is σ(S)/σ(V)≥k/m for such sequences. From this, the original claim for the graph would follow. Note that we have ignored some stronger properties that are no longer needed: The entire sequence of weights is not necessarily monotone, and σ(V)=1 is not required. (Obviously, ratios do not change if we multiply all weights with a common scaling factor.) Now we can proceed without much calculation, just using monotonicity arguments.
Lemma 22
σ(S)/σ(V)≥k/m holds for sequences that satisfy (i)–(iv).
Proof
Let V′ be any prefix, S′=S∩V′, and k′=|S′|. By property (iv), σ(V′) is below average, and by monotonicity property (iii), σ(S′) is above average (for a selection of sets S i with totally k′ vertices, together with their L i ). Hence σ(S′)/σ(V′)≥σ(S)/σ(V).
Now we multiply all weights in V′ with a scaling factor smaller than 1, until the weights in the last S i in V′ equal those in S i+1. Scaling down the prefix obviously does not affect (i) and (ii), moreover we have respected (iii), and (iv) remains true because diminishing some prefix reduces the relative weight of every prefix. Hence the invariants (i)–(iv) are preserved.
Moreover, σ(S′)/σ(V′)≥σ(S)/σ(V) ensures that the new ratio σ(S)/σ(V) after the scaling can only get smaller, as we have reduced the relative weight of V′ in V. Thus, by repeating this manipulation we can equalize all weights in S without increasing σ(S)/σ(V). But once all items in S have reached the same weight, σ(S)/σ(V)≥k/m is true because of property (ii). □
Altogether we can conclude:
Theorem 23
Let B=(U,V;E) be any bipartite graph which is U-balanced with respect to the edge weight function f with sum 1, let π be the distribution on V induced by f, and k=|U|, m=|V|. If B has a matching of size k then B also has a matching that covers some S⊆V with π(S)≥k/m.
The additional assumption that B has a matching of size k is not that satisfactory, but finally we prove the statement with a precondition only on the sizes of the partite sets of B.
Theorem 24
Let B=(U,V;E) be any bipartite graph which is U-balanced with respect to the edge weight function f with sum 1, let π be the distribution on V induced by f, and k=|U|, m=|V|. If m≥2k−1 then B has a matching that covers some S⊆V with π(S)≥k/m.
Proof
Let μ<k be the size of a maximum matching. There exists U 0⊂U, such that the induced subgraph B[U 0∪V] does have a matching that covers U 0, and U 0 is a maximal set with this property. In particular, |U 0|=μ. The total weight of edges in B[U 0∪V] is μ/k. By Theorem 23, V contains an independent set S, in the transversal matroid of B[U 0∪V], with weight at least (μ/m)(μ/k)=μ 2/(mk). Since U 0 was maximal, the k−μ vertices in U∖U 0 have all their neighbors in S. Thus, in the entire graph B, the basis S has a weight at least (k−μ)/k+μ 2/(mk)=((k−μ)m+μ 2)/(mk). Observe that ((k−μ)m+μ 2)/(mk)<k/m would imply (k−μ)m+μ 2<k 2, thus (k−μ)m<k 2−μ 2 and m<k+μ. Consequently, S has weight at least k/m provided that m≥k+μ, in particular if m≥2k−1. □
Rights and permissions
About this article
Cite this article
Damaschke, P., Muhammad, A.S. & Triesch, E. Two New Perspectives on Multi-Stage Group Testing. Algorithmica 67, 324–354 (2013). https://doi.org/10.1007/s00453-013-9781-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-013-9781-4