Abstract
String barcoding is a method that can identify microorganisms by analyzing their genome sequences. In this paper, we study the polylogarithmic string barcoding problem, where the lengths of the substrings in the testing set are polylogarithmically bounded. In particular, we show that the polylogarithmic string barcoding problem remains NP-hard and moreover, for a problem instance with n sequences, it is NP-hard to achieve an approximate ratio within dln n in polynomial time, where d is some constant. We then consider the parameterized polylogarithmic string barcoding problem, where the number of substrings in the test set is considered to be a fixed parameter k. We show that, unless W[2]=FPT, there does not exist a 2O(k) n c algorithm that can decide whether a test set of size k exists or not, where c is a constant independent of n and k.
Similar content being viewed by others
References
Berman P, DasGupta B, Sontag E (2004) Randomized approximation algorithms for set multicover problems with applications to reverse engineering of protein and gene networks. In: Proceedings of the seventh international workshop on approximation algorithms for combinatorial optimization problems, pp 39–50
Berman P, DasGupta B, Kao MY (2005) Tight approximability results for test set problems in bioinformatics. J Comput Syst Sci 71(2):145–162
Borneman J, Chrobak M, Vedova GD, Figueora A, Jiang T (2001) Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 1:1–9
Cazalis D, Milledge T, Narasimhan G (2004) Probe selection problem: structure and algorithms. In: Proceedings of the eighth multi-conference on systemics, cybernetics and informatics, pp 124–129
DasGupta B, Konwar K, Mandoiu II, Shvartsman A (2005) Highly scalable algorithms for robust string barcoding. In: Proceedings of international conference on computational science, vol 2, pp 1020–1028
Dinur I, Safra S (2002) The importance of being biased. In: Proceedings of the 34th annual ACM symposium on theory of computing, pp 33–42
Downey RG, Fellows MR (1999) Parameterized complexity. Springer, Berlin
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York
Gavil Z (1986) Efficient algorithms for finding maximum matchings in graphs. ACM Comput Surv 18(1):23–28
Rash S, Gusfield D (2002) String barcoding: uncovering optimal virus signatures. In: Proceedings of the sixth annual international conference on research in computational biology, pp 254–261
Raz R, Safra S (1997) A sub-constant error-probability low-degree test, and sub-constant error-probability PCP characterization of NP. In: Proceedings of the 29th annual ACM symposium on theory of computing, pp 475–484
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, C., Song, Y. & Burge, L.L. Parameterized lower bound and inapproximability of polylogarithmic string barcoding. J Comb Optim 16, 39–49 (2008). https://doi.org/10.1007/s10878-007-9097-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-007-9097-x