Skip to main content
Log in

Constant time approximation scheme for largest well predicted subset

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

The largest well predicted subset problem is formulated for comparison of two predicted 3D protein structures from the same sequence. A 3D protein structure is represented by an ordered point set A={a 1,…,a n }, where each a i is a point in 3D space. Given two ordered point sets A={a 1,…,a n } and B={b 1,b 2,…b n } containing n points, and a threshold d, the largest well predicted subset problem is to find the rigid body transformation T for a largest subset B opt of B such that the distance between a i and T(b i ) is at most d for every b i in B opt . A meaningful prediction requires that the size of B opt is at least αn for some constant α (Li et al. in CPM 2008, 2008). We use LWPS(A,B,d,α) to denote the largest well predicted subset problem with meaningful prediction. An (1+δ 1,1−δ 2)-approximation for LWPS(A,B,d,α) is to find a transformation T to bring a subset B′⊆B of size at least (1−δ 2)|B opt | such that for each b i B′, the Euclidean distance between the two points distance (a i ,T(b i ))≤(1+δ 1)d. We develop a constant time (1+δ 1,1−δ 2)-approximation algorithm for LWPS(A,B,d,α) for arbitrary positive constants δ 1 and δ 2. To our knowledge, this is the first constant time algorithm in this area. Li et al. (CPM 2008, 2008) showed an \(O(n(\log n)^{2}/\delta_{1}^{5})\) time randomized (1+δ 1)-distance approximation algorithm for the largest well predicted subset problem under meaningful prediction. We also study a closely related problem, the bottleneck distance problem, where we are given two ordered point sets A={a 1,…,a n } and B={b 1,b 2,…b n } containing n points and the problem is to find the smallest d opt such that there exists a rigid transformation T with distance(a i ,T(b i ))≤d opt for every point b i B. A (1+δ)-approximation for the bottleneck distance problem is to find a transformation T, such that for each b i B, distance (a i ,T(b i ))≤(1+δ)d opt , where δ is a constant. For an arbitrary constant δ, we obtain a linear O(n/δ 6) time (1+δ)-algorithm for the bottleneck distance problem. The best known algorithms for both problems require super-linear time (Li et al. in CPM 2008, 2008).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ambühl C, Chakraborty S, Gärtner B (2000) Computing largest common point sets under approximate congruence. In: ESA 2000. LNCS, vol 1879, pp 52–63

    Chapter  Google Scholar 

  • Arun KS, Huang TS, Blostein SD (1987) Least-squares fitting of two 3-d point sets. IEEE Trans Pattern Anal Mach Intell 9(5):698–700

    Article  Google Scholar 

  • Choi V, Goyal N (2004) A combinatorial shape matching algorithm for rigid protein docking. In: CPM 2004. LNCS, vol 3109, pp 285–296

    Google Scholar 

  • Goodrich M, Mitchell J, Orletsky M (1994) Practical methods for approximate geometric pattern matching under rigid motions. In: SOCG 1994, pp 103–112

    Google Scholar 

  • Indyk P, Motwani R (1999) Geometric matching under noise: combinatorial bounds and algorithms. In: SODA 1999, pp 457–465

    Google Scholar 

  • Lancia G, Istrail S (2003) Protein structure comparison: algorithms and applications. In: Mathemat methods for protein struct analysis and design, pp 1–33

    Chapter  Google Scholar 

  • Li M, Ma B, Wang L (2002) On the closest string and substring problems. J ACM 49(2):157–171

    Article  MathSciNet  Google Scholar 

  • Li SC, Bu D, Xu J, Li M (2008) Finding largest well-predicted subset of protein structure models. In: CPM 2008. LNCS, vol 5029, pp 44–55

    Google Scholar 

  • Motwani R, Raghavan P (2000) Randomized algorithms. Cambridge University Press, Cambridge

    Google Scholar 

  • Siew N, Elofsson A, Rychlewski L, Fischer D (2000) Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9):776–785

    Article  Google Scholar 

  • Zemla A (2003) LGA: a method for folding 3d similarities in protein structures. Nucleic Acids Res 31(13):3370–3374

    Article  Google Scholar 

  • Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Fu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, B., Wang, L. Constant time approximation scheme for largest well predicted subset. J Comb Optim 25, 352–367 (2013). https://doi.org/10.1007/s10878-010-9371-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-010-9371-1

Keywords

Navigation