Skip to main content
Log in

Two New Perspectives on Multi-Stage Group Testing

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The group testing problem asks to find dn defective elements out of n elements, by testing subsets (pools) for the presence of defectives. In the strict model of group testing, the goal is to identify all defectives if at most d defectives exist, and otherwise to report that more than d defectives are present. If tests are time-consuming, they should be performed in a small constant number s of stages of parallel tests. It is known that a test number O(dlogn), which is optimal up to a constant factor, can be achieved already in s=2 stages. Here we study two aspects of group testing that have not found major attention before. (1) Asymptotic bounds on the test number do not yet lead to optimal strategies for specific n,d,s. Especially for small n we show that randomized strategies significantly save tests on average, compared to worst-case deterministic results. Moreover, the only type of randomness needed is a random permutation of the elements. We solve the problem of constructing optimal randomized strategies for strict group testing completely for the case when d=1 and s≤2. A byproduct of our analysis is that optimal deterministic strategies for strict group testing for d=1 need at most 2 stages. (2) Usually, an element may participate in several pools within a stage. However, when the elements are indivisible objects, every element can belong to at most one pool at the same time. For group testing with disjoint simultaneous pools we show that Θ(sd(n/d)1/s) tests are sufficient and necessary. While the strategy is simple, the challenge is to derive tight lower bounds for different s and different ranges of d versus n.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Note that this strategy is deterministic up to random permutation, however we do not need optimality as in Theorem 2.

References

  1. Aigner, M.: Combinatorial Search. Wiley-Teubner, New York (1988)

    MATH  Google Scholar 

  2. Balding, D.J., Torney, D.C.: Optimal pooling designs with error detection. J. Comb. Theory, Ser. A 74, 131–140 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bar-Lev, S.K., Boneh, A., Perry, D.: Incomplete identification models for group-testable items. Nav. Res. Logist. 37, 647–659 (1990)

    Article  MATH  Google Scholar 

  4. Cheng, Y., Du, D.Z.: New constructions of one- and two-stage pooling designs. J. Comput. Biol. 15, 195–205 (2008)

    Article  MathSciNet  Google Scholar 

  5. Cicalese, F., Damaschke, P., Vaccaro, U.: Optimal group testing strategies with interval queries and their application to splice site detection. Int. J. Bioinform. Res. Appl. 1, 363–388 (2005)

    Article  Google Scholar 

  6. Cicalese, F., Damaschke, P., Tansini, L., Werth, S.: Overlaps help: improved bounds for group testing with interval queries. Discrete Appl. Math. 155, 288–299 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Clements, G.F.: The minimal number of basic elements in a multiset antichain. J. Comb. Theory, Ser. A 25, 153–162 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  8. Colbourn, C.J., Dinitz, J.H.: The CRC Handbook of Combinatorial Designs. CRC Press, Boca Raton (1996)

    Book  MATH  Google Scholar 

  9. Dachman-Soled, D., Servedio, R.: A canonical form for testing Boolean function properties. In: Goldberg, L.A., Jansen, K., Ravi, R., Rolim, J.D.P. (eds.) Approximation, Randomization, and Combinatorial Optimization, Algorithms and Techniques APPROX-RANDOM 2011. LNCS, vol. 6845, pp. 460–471. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Damaschke, P., Sheikh Muhammad, A.: Randomized group testing both query-optimal and minimal adaptive. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) 38th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2012. LNCS, vol. 7147, pp. 214–225. Springer, Heidelberg (2012)

    Google Scholar 

  11. De Bonis, A., Di Crescenco, G.: Combinatorial group testing for corruption localizing hashing. In: Fu, B., Du, D.Z. (eds.) Computing and Combinatorics, COCOON 2011. LNCS, vol. 6842, pp. 579–591. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. De Bonis, A., Gasieniec, L., Vaccaro, U.: Optimal two-stage algorithms for group testing problems. SIAM J. Comput. 34, 1253–1270 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Du, D.Z., Hwang, F.K.: Combinatorial Group Testing and Its Applications. Series on Appl. Math., vol. 18. World Scientific, Singapore (2000)

    MATH  Google Scholar 

  14. Du, D.Z., Hwang, F.K.: Pooling Designs and Nonadaptive Group Testing. Series on Appl. Math., vol. 18. World Scientific, Singapore (2006)

    MATH  Google Scholar 

  15. Dyachkov, A.G., Rykov, V.V.: Bounds on the length of disjunctive codes. Probl. Inf. Transm. 18, 7–13 (1982) (in Russian)

    MathSciNet  Google Scholar 

  16. Edmonds, J.: Matroids and the greedy algorithm. Math. Program. 1, 127–136 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  17. Eppstein, D., Goodrich, M.T., Hirschberg, D.S.: Improved combinatorial group testing algorithms for real-world problem sizes. SIAM J. Comput. 36, 1360–1375 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Fang, J., Jiang, Z.L., Yiu, S.M., Hui, L.C.K.: An efficient scheme for hard disk integrity check in digital forensics by hashing with combinatorial group testing. Int. J. Digit. Content Technol. Appl. 5, 300–308 (2011)

    Article  Google Scholar 

  19. Fischer, P., Klasner, N., Wegener, I.: On the cut-off point for combinatorial group testing. Discrete Appl. Math. 91, 83–92 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  20. Goldreich, O., Trevisan, L.: Three theorems regarding testing graph properties. Random Struct. Algorithms 23, 23–57 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  21. Goodrich, M.T., Hirschberg, D.S.: Improved adaptive group testing algorithms with applications to multiple access channels and dead sensor diagnosis. J. Comb. Optim. 15, 95–121 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  22. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952)

    Article  Google Scholar 

  23. Huang, S.H., Hwang, F.K.: When is individual testing optimal for nonadaptive group testing? SIAM J. Discrete Math. 14, 540–548 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  24. Li, C.H.: A sequential method for screening experimental variables. J. Am. Stat. Assoc. 57, 455–477 (1962)

    Article  MATH  Google Scholar 

  25. Lubell, D.: A short proof of Sperner’s lemma. J. Comb. Theory 1, 299 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  26. Mézard, M., Toninelli, C.: Group testing with random pools: optimal two-stage algorithms. IEEE Trans. Inf. Theory 57, 1736–1745 (2011)

    Article  Google Scholar 

  27. Ruszinkó, M.: On the upper bound of the size of the r-cover-free families. J. Comb. Theory, Ser. A 66, 302–310 (1994)

    Article  MATH  Google Scholar 

  28. Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math. Z. 27, 544–548 (1928) (in German)

    Article  MathSciNet  MATH  Google Scholar 

  29. Schlaghoff, J., Triesch, E.: Improved results for competitive group testing. Comb. Probab. Comput. 14, 191–202 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  30. Welsh, D.J.A.: Matroid Theory. Academic Press, San Diego (1976)

    MATH  Google Scholar 

Download references

Acknowledgements

The work of the first two authors has been supported by the Swedish Research Council (Vetenskapsrådet), through grant 2010-4661, “Generalized and fast search strategies for parameterized problems”. The first author also received support from RWTH Aachen during a visit in 2011. We thank the referees of ICALP2001GT and of this Special Issue for their careful remarks and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Damaschke.

Additional information

Part of the paper has been presented by the first author at the ICALP Workshop on Algorithms and Data Structures for Selection, Identification and Encoding ICALP2011GT, Zürich.

Appendix A

Appendix A

1.1 A.1 Interpretation of Group Testing as a Game

Here we give a more formal presentation of the game-theoretic setting we use for proving lower bounds. In the group testing problem we are searching for some unknown subset P of an n-element set I of elements, and we can choose pools QI in order to test whether QP is empty or not.

A deterministic s-stage group testing strategy A is a function which successively selects sets \(T_{i}:=\{Q_{i,1},\ldots,Q_{i,t_{i}}\}\) of pools, 1≤is. Given some subset PI, the result of test stage i is given by the vector \(r_{i}:=(r_{i,1},\ldots,r_{i,t_{i}})\), where r i,j :=0 if Q i.1P=∅, and r i,j :=1 else. The strategy may use the results of stages 1,…,i−1 when selecting T i , hence T i may depend on P if i>1. The cost of stage i is the number of tests, t i =t i (P).

A strategy A solves the strict (d,n) group testing problem if, for each PI, |P|≤d, the sequence of results r i , 1≤is, uniquely determines P as a subset of I. (In particular, if the strategy is applied to a set P of cardinality larger than d, the test results show that |P|>d).

We denote by \(\mathcal{A}\) the finite set of all deterministic s-stage strategies solving the strict (d,n) group testing problem. Moreover, we denote by \(t(A,P):=\sum_{i=1}^{s} t_{i}(P)\) the cost of A, given P, and by t(A):=max{t(A,P):PI,|P|≤d}, the worst-case cost of strategy A. Now, \(t(n,d,s):=\min_{A\in {\mathcal{A}}}t(A)\).

An adversary strategy S is a function which, given a sequence r j , 1≤ji−1, of possible results for an s-stage group testing algorithm and some test set T i of pools, chooses some vector r i =S(r 1,…,r i−1,T i ) of results for the tests in T i . In what follows, we are considering adversary strategies generating results which are consistent with at least one subset PI and denote the (finite) set of all such strategies by \(\mathcal{S}\).

Given some adversary strategy \(S\in \mathcal{S}\) and an s-stage group testing strategy \(A\in \mathcal{A}\), we define t(A,S) as the total number of tests used if A is applied to the results generated by S, while S is applied to the test sets generated by A.

The number t(A,S) can be interpreted as the total number of tests used in a game between two players, the searcher and the hider. The searcher wants to find P, and the tests she uses are given by algorithm A. The hider provides results for the tests by using her strategy S. The number \(t(S):=\min_{\mathcal{A}}t(A,S)\) denotes the worst-case cost that the hider can enforce by using strategy S.

Clearly, we have \(t(A)=\max_{S\in \mathcal{S}}t(A,S)\), hence \(\min_{\mathcal{A}}t(A)\geq \max_{\mathcal{S}}t(S)\). In fact, it is well known that equality holds:

This can be proved by induction on s by slightly modifying the proof of Proposition 1.31 in [1] and is a special case of the Minimax theorem of Game Theory. Hence t(S) is a lower bound for t(n,d,s) for each adversary strategy S.

In our arguments, we also use a variant of the strategies discussed above: Imagine that, in stage i, the pool set T i is tested. Instead of providing results for the tests Q i,j only, the hider might be allowed to reveal more information for free, that is, add positive or negative test sets Q i,j to T i , which are not counted as tests in computing t(A,S). (Of course, the answers must remain consistent with some set P). For a strategy S in this generalized sense, the inequality t(S)≤t(n,d,s) holds a fortiori.

A randomized strategy is now given by a probability distribution π on \(\mathcal{A}\). Its expected costs are defined as \(E[\pi]:=\sum_{A\in\mathcal{A}}\pi (A)t(A)\). With similar definitions for randomized adversary strategies τ as probability distributions on \(\mathcal{S}\), the Minimax theorem of Game Theory yields:

1.2 A.2 Extremal Matchings in Balanced Weighted Bipartite Graphs

In this section we prove the Claim from Sect. 3.4, provided that m≥2k−1 (equivalently, d≤(q+3)/2). It was already stated and motivated there, such that we can now focus on the proof. We presume familiarity with the notion of a matroid, otherwise we refer to [30].

The transversal matroid of a bipartite graph B=(U,V;E) has as independent sets exactly those sets SV for which there exists a matching that covers S. It is well known that, in any matroid, an independent set, also called a basis, of maximum weight can be computed by a greedy algorithm due to [16]. It works as follows in our case of a transversal matroid, for a given B=(U,V;E) and π: Sort the vertices in V by non-ascending weights π(v), and start with S:=∅. Scan this sorted list and always put v into the basis S if and only if the resulting set S:=S∪{v} remains independent, otherwise let S unchanged. (The greedy algorithm has to call an “oracle” that checks whether some matching covers S.) We denote by g(π) the weight π(S) of a maximum-weight basis S.

We suppose in the following that B possesses a matching of size k, that is, a matching that covers U. Then we have |S|=k, since in a matroid all maximal independent sets have the same cardinality.

Let Δ denote the set of all distributions π on V. Since Δ is compact and g, with argument π∈Δ, is a continuous function on Δ, the minimum min π∈Δ g(π) exists. Denote by Δ the set of all distributions π ∈Δ where g(π ) attains this minimum. Δ is compact as well.

In the following we adopt the convention xlog2 x=0 for x=0. The entropy function h(π):=−∑ vV π(v)log2 π(v) is continuous, in particular on Δ. Hence some distribution σ∈Δ maximizes the entropy. We claim that g(σ)≥k/m. Of course, the Claim then follows for all distributions from Δ and hence from Δ, and then the k/m lower bound is proved for any graph B as specified above. We just need the entropy as an auxiliary measure to prove this Claim. A basic property of the entropy h(π) is that it increases if we replace two items in π, say x and y such that x<y, with x+ϵ<yϵ, where ϵ>0.

When applying the greedy algorithm to σ, we may assume that the vertices of V are sorted in such a way that, for every existing value a of vertex weights σ(v), those vertices v from {vV: σ(v)=a} that enter the basis S appear before the vertices of equal weight that do not enter the basis. Thus we obtain the following structure. We can split S into S=S 1S 2∪⋯∪S r , such that all vertices in each S i have the same weight, and the weights in each S i are strictly larger than the weights in S i+1. Furthermore we can split L:=VS into L=L 1L 2∪⋯∪L r , such that S 1,L 1,S 2,L 2,…,S r ,L r is a partition of the whole V sorted by non-ascending weights, into consecutive subsets, and the weights in each L i are strictly larger than the weights in S i+1. Note that some L i may be empty.

We also fix some matching that covers S (which exists since S is a basis) and divide U into U=W 1W 2∪⋯∪W r , where W i comprises the matching partners of vertices in S i .

Lemma 21

The vertices in each S j L j have all their neighbors in W 1∪⋯∪W j .

Proof

Any edge incident to a vertex in vL j has its second vertex in some W i with ij, since otherwise the greedy algorithm would have added v to the basis. The main work is to prove the lemma also for S j .

Assume that an edge uv exists with uW i , vS j , and i>j. Let wS i be the matching partner of u. We increase f(uw) and decrease f(uv) by some ϵ>0; we will specify below how small it has to be. Let σ′ be the resulting distribution on V. Clearly, the change does not affect the sum of weights of the edges incident with uU, and we get σ′(w)=σ(w)+ϵ and σ′(v)=σ(v)−ϵ. Moreover, σ′ has a higher entropy than σ.

Since all σ weights in S i were equal, we can assume w to be the first vertex in S i , and since the σ weights in S i−1L i−1 were strictly larger, the position of w in the order of σ′ weights remains the same as in σ, for a small enough ϵ. Similarly we can assume that v is the last vertex in S j . If v keeps its position as well, then the greedy algorithm returns the same basis containing both v and w, hence g(σ′)=g(σ). Now assume that g(σ′)>g(σ). Then there must be some v′∈L j with σ(v′)=σ(v) that replaces v in the greedy basis (otherwise the greedy basis cannot change any more). That means, S 1∪⋯∪S j ∪{v′}∖{v} is an independent set in the transversal matroid of B. Accordingly, let M′ be a matching that covers S 1∪⋯∪S j ∪{v′}∖{v}. Let M be a matching that covers S 1∪⋯∪S j and maps this set to U=W 1∪⋯∪W j . Note that M exists, and u does not belong to any edge in M.

Consider the alternating path A in M′∪M that starts in v′ with an edge of M′. If A ends in U, then we can use the M′-edges in A to cover v′ and those vertices of S 1∪⋯∪S j that appear in A. The other vertices of S 1∪⋯∪S j are still covered by their M-edges. If A ends in V, then the last vertex of A is necessarily v, since otherwise we could continue A with another M′-edge. But now we can append the edge vu to A instead, since u is not involved in any M-edge. We are back to the previous situation and can cover v′ and those vertices of S 1∪⋯∪S j that appear in A, using M′-edges and the extra edge vu. Finally, A cannot be a cycle, i.e., return to v′, since v′ is not incident with any M-edge. In either case we found a matching that covers S 1∪⋯∪S j ∪{v′}. But this contradicts the fact that the greedy algorithm applied to the original σ weights did not put v′ in the basis.

This contradiction disproves the assumption g(σ′)>g(σ). Hence we always obtain g(σ′)=g(σ), since g(σ) was minimal. Thus we have σ′∈Δ, contradicting the choice of σ as the distribution with maximum entropy in Δ. This contradiction, in turn, shows that the assumed edge uv with uW i , vS j , and i>j cannot exist, which completes the proof. □

Define k i :=|S i |, and let σ(X) denote the weight of any subset XV. Consider any prefix S 1,L 1,…,S i ,L i of our sequence S 1,L 1,…,S r ,L r . The edges incident to the k 1+⋯+k i vertices in W 1∪⋯∪W i have together the weight (k 1+⋯+k i )/k, and the vertices in S 1L 1∪⋯∪S i L i get their weights from such edges only, due to Lemma 21. It follows σ(S 1L 1∪⋯∪S i L i )≤(k 1+⋯+k i )/k. Recall that k=|S|, m=|V|, and we want to prove σ(S)≥k/m.

At this point we can forget about the graph and consider S 1,L 1,…,S r ,L r as a sequence of sets of weighted items which enjoys the following properties:

  1. (i)

    All weights in each S i are equal.

  2. (ii)

    All weights in S i are at least as large as all weights in L i .

  3. (iii)

    All weights in S i are at least as large as all weights in S i+1.

  4. (iv)

    All prefixes satisfy σ(S 1L 1∪⋯∪S i L i )/σ(V)≤(k 1+⋯+k i )/k.

Our claim is σ(S)/σ(V)≥k/m for such sequences. From this, the original claim for the graph would follow. Note that we have ignored some stronger properties that are no longer needed: The entire sequence of weights is not necessarily monotone, and σ(V)=1 is not required. (Obviously, ratios do not change if we multiply all weights with a common scaling factor.) Now we can proceed without much calculation, just using monotonicity arguments.

Lemma 22

σ(S)/σ(V)≥k/m holds for sequences that satisfy (i)(iv).

Proof

Let V′ be any prefix, S′=SV′, and k′=|S′|. By property (iv), σ(V′) is below average, and by monotonicity property (iii), σ(S′) is above average (for a selection of sets S i with totally k′ vertices, together with their L i ). Hence σ(S′)/σ(V′)≥σ(S)/σ(V).

Now we multiply all weights in V′ with a scaling factor smaller than 1, until the weights in the last S i in V′ equal those in S i+1. Scaling down the prefix obviously does not affect (i) and (ii), moreover we have respected (iii), and (iv) remains true because diminishing some prefix reduces the relative weight of every prefix. Hence the invariants (i)–(iv) are preserved.

Moreover, σ(S′)/σ(V′)≥σ(S)/σ(V) ensures that the new ratio σ(S)/σ(V) after the scaling can only get smaller, as we have reduced the relative weight of V′ in V. Thus, by repeating this manipulation we can equalize all weights in S without increasing σ(S)/σ(V). But once all items in S have reached the same weight, σ(S)/σ(V)≥k/m is true because of property (ii). □

Altogether we can conclude:

Theorem 23

Let B=(U,V;E) be any bipartite graph which is U-balanced with respect to the edge weight function f with sum 1, let π be the distribution on V induced by f, and k=|U|, m=|V|. If B has a matching of size k then B also has a matching that covers some SV with π(S)≥k/m.

The additional assumption that B has a matching of size k is not that satisfactory, but finally we prove the statement with a precondition only on the sizes of the partite sets of B.

Theorem 24

Let B=(U,V;E) be any bipartite graph which is U-balanced with respect to the edge weight function f with sum 1, let π be the distribution on V induced by f, and k=|U|, m=|V|. If m≥2k−1 then B has a matching that covers some SV with π(S)≥k/m.

Proof

Let μ<k be the size of a maximum matching. There exists U 0U, such that the induced subgraph B[U 0V] does have a matching that covers U 0, and U 0 is a maximal set with this property. In particular, |U 0|=μ. The total weight of edges in B[U 0V] is μ/k. By Theorem 23, V contains an independent set S, in the transversal matroid of B[U 0V], with weight at least (μ/m)(μ/k)=μ 2/(mk). Since U 0 was maximal, the kμ vertices in UU 0 have all their neighbors in S. Thus, in the entire graph B, the basis S has a weight at least (kμ)/k+μ 2/(mk)=((kμ)m+μ 2)/(mk). Observe that ((kμ)m+μ 2)/(mk)<k/m would imply (kμ)m+μ 2<k 2, thus (kμ)m<k 2μ 2 and m<k+μ. Consequently, S has weight at least k/m provided that mk+μ, in particular if m≥2k−1. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Damaschke, P., Muhammad, A.S. & Triesch, E. Two New Perspectives on Multi-Stage Group Testing. Algorithmica 67, 324–354 (2013). https://doi.org/10.1007/s00453-013-9781-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-013-9781-4

Keywords

Navigation