Abstract
Consider two sets of strings, \( \mathcal{B} \) (bad genes) and \( \mathcal{G} \) (good genes), as well as two integers d b and d g (d b ≤ d g). A frequently occurring problem in computational biology (and other fields) is to find a (distinguishing) substring s of length L that distinguishes the bad strings from good strings, i.e., for each string s i ∈ \( \mathcal{B} \) there exists a length-L substring t i of s i with d(s, t i) ≤ d b (close to bad strings) and for every substring u i of length L of every string g i ∈ \( \mathcal{G} \) , d(s, u i) ≥ d g (far from good strings). We present a polynomial time approximation scheme to settle the problem, i.e., for any constant ∈ τ 0, the algorithm finds a string s of length L such that for every s i ∈ \( \mathcal{B} \) , there is a length-L substring t i of s0i with d(t i, s) ≤ (1 + ∈)d b and for every substring u i of length L of every g i ∈ \( \mathcal{G} \) , d(u i, s) ≥ (1 - ∈)d g, if a solution to the original pair (d b ≤ d g) exists.
Fully supported by a grant from the Natural Science Foundation of China and Research Grants Council of the HKSAR Joint Research Scheme [Project No: NCityU 102/01].
Fully supported by a grant from the Research Grants Council of the Hong Knog SAR, China [Project No: CityU 1130/99E].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Ben-Dor, G. Lancia, J. Perone, and R. Ravi, Banishing bias from consensus sequences, Proc. 8th Ann. Combinatorial Pattern Matching Conf., pp. 247–261, 1997.
J. Dopazo, A. Rodríguez, J. C. Sáiz, and F. Sobrino, Design of primers for PCR amplification of highly variable genomes, CABIOS, 9(1993), 123–125.
M. Frances, A. Litman, On covering problems of codes, Theor. Comput. Syst., 30(1997), 113–119.
L. Gcasieniec, J. Jansson, and A. Lingas, Efficient approximation algorithms for the Hamming center problem, Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. S905–S906, 1999.
M. Ito, K. Shimizu, M. Nakanishi, and A. Hashimoto, Polynominal-time algorithms for computing characteristic strings, Proc. 5th Annual Symposium on Combinatorial Pattern Matching, pp. 274–288, (1994).
K. Lucas, M. Busch, S. Mössinger and J.A. Thompson, An improved microcomputer program for finding gene-or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes, CABIOS, 7(1991), 525–529.
K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang, Distinguishing string selection problems, SODA’99, pp. 633–642..
Ming Li, Bin Ma,and Lusheng Wang, “Finding similar regions in many strings”, the 31th ACM Symp. on Theory of Computing, pp. 473–482, 1999.
B. Ma, A polynomial time approximation scheme for the closest substring problem, Proc. 11th Annual Symposium on Combinatorial Pattern Matching, pp. 99–107, Montreal, (2000).
R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge Univ. Press, 1995.
C.H. Papadimitriou and M. Yannakakis, On the approximability of trade-offs and optimal access of web sources, FOCS00, pp. 86–92, 2000.
V. Proutski and E. C. Holme, Primer Master: a new program for the design and analysis of PCR primers, CABIOS, 12(1996), 253–255.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deng, X., Li, G., Li, Z., Ma, B., Wang, L. (2002). A PTAS for Distinguishing (Sub)string Selection. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds) Automata, Languages and Programming. ICALP 2002. Lecture Notes in Computer Science, vol 2380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45465-9_63
Download citation
DOI: https://doi.org/10.1007/3-540-45465-9_63
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43864-9
Online ISBN: 978-3-540-45465-6
eBook Packages: Springer Book Archive