Constant time approximation scheme for largest well predicted subset

Fu, Bin; Wang, Lusheng

doi:10.1007/s10878-010-9371-1

Constant time approximation scheme for largest well predicted subset

Published: 08 December 2010

Volume 25, pages 352–367, (2013)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Bin Fu¹ &
Lusheng Wang²

118 Accesses
Explore all metrics

Abstract

The largest well predicted subset problem is formulated for comparison of two predicted 3D protein structures from the same sequence. A 3D protein structure is represented by an ordered point set A={a ₁,…,a _n}, where each a _i is a point in 3D space. Given two ordered point sets A={a ₁,…,a _n} and B={b ₁,b ₂,…b _n} containing n points, and a threshold d, the largest well predicted subset problem is to find the rigid body transformation T for a largest subset B _opt of B such that the distance between a _i and T(b _i) is at most d for every b _i in B _opt. A meaningful prediction requires that the size of B _opt is at least αn for some constant α (Li et al. in CPM 2008, 2008). We use LWPS(A,B,d,α) to denote the largest well predicted subset problem with meaningful prediction. An (1+δ ₁,1−δ ₂)-approximation for LWPS(A,B,d,α) is to find a transformation T to bring a subset B′⊆B of size at least (1−δ ₂)|B _opt| such that for each b _i∈B′, the Euclidean distance between the two points distance (a _i,T(b _i))≤(1+δ ₁)d. We develop a constant time (1+δ ₁,1−δ ₂)-approximation algorithm for LWPS(A,B,d,α) for arbitrary positive constants δ ₁ and δ ₂. To our knowledge, this is the first constant time algorithm in this area. Li et al. (CPM 2008, 2008) showed an \(O(n(\log n)^{2}/\delta_{1}^{5})\) time randomized (1+δ ₁)-distance approximation algorithm for the largest well predicted subset problem under meaningful prediction. We also study a closely related problem, the bottleneck distance problem, where we are given two ordered point sets A={a ₁,…,a _n} and B={b ₁,b ₂,…b _n} containing n points and the problem is to find the smallest d _opt such that there exists a rigid transformation T with distance(a _i,T(b _i))≤d _opt for every point b _i∈B. A (1+δ)-approximation for the bottleneck distance problem is to find a transformation T, such that for each b _i∈B, distance (a _i,T(b _i))≤(1+δ)d _opt, where δ is a constant. For an arbitrary constant δ, we obtain a linear O(n/δ ⁶) time (1+δ)-algorithm for the bottleneck distance problem. The best known algorithms for both problems require super-linear time (Li et al. in CPM 2008, 2008).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polynomial-Time Approximation Scheme for a Problem of Searching for the Largest Subset with the Constraint on Quadratic Variation

Exact Algorithms for the Special Cases of Two Hard to Solve Problems of Searching for the Largest Subset

A polynomial time algorithm for computing the area under a GDT curve

Article Open access 26 October 2015

References

Ambühl C, Chakraborty S, Gärtner B (2000) Computing largest common point sets under approximate congruence. In: ESA 2000. LNCS, vol 1879, pp 52–63
Chapter Google Scholar
Arun KS, Huang TS, Blostein SD (1987) Least-squares fitting of two 3-d point sets. IEEE Trans Pattern Anal Mach Intell 9(5):698–700
Article Google Scholar
Choi V, Goyal N (2004) A combinatorial shape matching algorithm for rigid protein docking. In: CPM 2004. LNCS, vol 3109, pp 285–296
Google Scholar
Goodrich M, Mitchell J, Orletsky M (1994) Practical methods for approximate geometric pattern matching under rigid motions. In: SOCG 1994, pp 103–112
Google Scholar
Indyk P, Motwani R (1999) Geometric matching under noise: combinatorial bounds and algorithms. In: SODA 1999, pp 457–465
Google Scholar
Lancia G, Istrail S (2003) Protein structure comparison: algorithms and applications. In: Mathemat methods for protein struct analysis and design, pp 1–33
Chapter Google Scholar
Li M, Ma B, Wang L (2002) On the closest string and substring problems. J ACM 49(2):157–171
Article MathSciNet Google Scholar
Li SC, Bu D, Xu J, Li M (2008) Finding largest well-predicted subset of protein structure models. In: CPM 2008. LNCS, vol 5029, pp 44–55
Google Scholar
Motwani R, Raghavan P (2000) Randomized algorithms. Cambridge University Press, Cambridge
Google Scholar
Siew N, Elofsson A, Rychlewski L, Fischer D (2000) Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9):776–785
Article Google Scholar
Zemla A (2003) LGA: a method for folding 3d similarities in protein structures. Nucleic Acids Res 31(13):3370–3374
Article Google Scholar
Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas–Pan American, Edinburg, TX, 78539, USA
Bin Fu
Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong
Lusheng Wang

Authors

Bin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Lusheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Fu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, B., Wang, L. Constant time approximation scheme for largest well predicted subset. J Comb Optim 25, 352–367 (2013). https://doi.org/10.1007/s10878-010-9371-1

Download citation

Published: 08 December 2010
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10878-010-9371-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constant time approximation scheme for largest well predicted subset

Abstract

Access this article

Similar content being viewed by others

Polynomial-Time Approximation Scheme for a Problem of Searching for the Largest Subset with the Constraint on Quadratic Variation

Exact Algorithms for the Special Cases of Two Hard to Solve Problems of Searching for the Largest Subset

A polynomial time algorithm for computing the area under a GDT curve

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constant time approximation scheme for largest well predicted subset

Abstract

Access this article

Similar content being viewed by others

Polynomial-Time Approximation Scheme for a Problem of Searching for the Largest Subset with the Constraint on Quadratic Variation

Exact Algorithms for the Special Cases of Two Hard to Solve Problems of Searching for the Largest Subset

A polynomial time algorithm for computing the area under a GDT curve

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation