Abstract
The largest well predicted subset problem is formulated for comparison of two predicted 3D protein structures from the same sequence. A 3D protein structure is represented by an ordered point set A={a 1,…,a n }, where each a i is a point in 3D space. Given two ordered point sets A={a 1,…,a n } and B={b 1,b 2,…b n } containing n points, and a threshold d, the largest well predicted subset problem is to find the rigid body transformation T for a largest subset B opt of B such that the distance between a i and T(b i ) is at most d for every b i in B opt . A meaningful prediction requires that the size of B opt is at least αn for some constant α (Li et al. in CPM 2008, 2008). We use LWPS(A,B,d,α) to denote the largest well predicted subset problem with meaningful prediction. An (1+δ 1,1−δ 2)-approximation for LWPS(A,B,d,α) is to find a transformation T to bring a subset B′⊆B of size at least (1−δ 2)|B opt | such that for each b i ∈B′, the Euclidean distance between the two points distance (a i ,T(b i ))≤(1+δ 1)d. We develop a constant time (1+δ 1,1−δ 2)-approximation algorithm for LWPS(A,B,d,α) for arbitrary positive constants δ 1 and δ 2. To our knowledge, this is the first constant time algorithm in this area. Li et al. (CPM 2008, 2008) showed an \(O(n(\log n)^{2}/\delta_{1}^{5})\) time randomized (1+δ 1)-distance approximation algorithm for the largest well predicted subset problem under meaningful prediction. We also study a closely related problem, the bottleneck distance problem, where we are given two ordered point sets A={a 1,…,a n } and B={b 1,b 2,…b n } containing n points and the problem is to find the smallest d opt such that there exists a rigid transformation T with distance(a i ,T(b i ))≤d opt for every point b i ∈B. A (1+δ)-approximation for the bottleneck distance problem is to find a transformation T, such that for each b i ∈B, distance (a i ,T(b i ))≤(1+δ)d opt , where δ is a constant. For an arbitrary constant δ, we obtain a linear O(n/δ 6) time (1+δ)-algorithm for the bottleneck distance problem. The best known algorithms for both problems require super-linear time (Li et al. in CPM 2008, 2008).
Similar content being viewed by others
References
Ambühl C, Chakraborty S, Gärtner B (2000) Computing largest common point sets under approximate congruence. In: ESA 2000. LNCS, vol 1879, pp 52–63
Arun KS, Huang TS, Blostein SD (1987) Least-squares fitting of two 3-d point sets. IEEE Trans Pattern Anal Mach Intell 9(5):698–700
Choi V, Goyal N (2004) A combinatorial shape matching algorithm for rigid protein docking. In: CPM 2004. LNCS, vol 3109, pp 285–296
Goodrich M, Mitchell J, Orletsky M (1994) Practical methods for approximate geometric pattern matching under rigid motions. In: SOCG 1994, pp 103–112
Indyk P, Motwani R (1999) Geometric matching under noise: combinatorial bounds and algorithms. In: SODA 1999, pp 457–465
Lancia G, Istrail S (2003) Protein structure comparison: algorithms and applications. In: Mathemat methods for protein struct analysis and design, pp 1–33
Li M, Ma B, Wang L (2002) On the closest string and substring problems. J ACM 49(2):157–171
Li SC, Bu D, Xu J, Li M (2008) Finding largest well-predicted subset of protein structure models. In: CPM 2008. LNCS, vol 5029, pp 44–55
Motwani R, Raghavan P (2000) Randomized algorithms. Cambridge University Press, Cambridge
Siew N, Elofsson A, Rychlewski L, Fischer D (2000) Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9):776–785
Zemla A (2003) LGA: a method for folding 3d similarities in protein structures. Nucleic Acids Res 31(13):3370–3374
Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fu, B., Wang, L. Constant time approximation scheme for largest well predicted subset. J Comb Optim 25, 352–367 (2013). https://doi.org/10.1007/s10878-010-9371-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-010-9371-1