Skip to main content

Finding Longest Common Segments in Protein Structures in Nearly Linear Time

  • Conference paper
Combinatorial Pattern Matching (CPM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7354))

Included in the following conference series:

  • 903 Accesses

Abstract

The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a 1, …, a n ) and B = (b 1, …, b n ) where a k , b k  ∈ ℝ3, we are to find, among all the segments f = (a i ,…,a j ) and g = (b i ,…,b j ) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean square deviation (RMSD) between f and g is to be within a given t ∈ ℝ; (2) f and g can be superposed such that for each k, i ≤ k ≤ j, ||a k  − b k || ≤ t for a given t ∈ ℝ. We give an algorithm of \(O(n\log n+n\mbox{\it \textbf{l}})\) time complexity when the first requirement applies, where \(\mbox{\it \textbf{l}}\) is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any ε ∈ ℝ, finds a segment of length at least l, but of RMSD up to (1 + ε)t, in O(nlogn + n/ε) time. We propose an FPTAS which for any given ε ∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k, i ≤ k ≤ j, ||a k  − b k || ≤ (1 + ε) t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(nlog2 n/ε 5) when consecutive points in A are separated by the same distance (which is the case with protein structures).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 698–700 (1987)

    Article  Google Scholar 

  2. Bowie, J.U., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known 3-dimensional structure. Science 253(5016), 164–170 (1991)

    Article  Google Scholar 

  3. Bryant, S.H., Altschul, S.F.: Statistics of sequence-structure threading. Current Opinion in Structural Biology 5(2), 236–244 (1995)

    Article  Google Scholar 

  4. Choi, V., Goyal, N.: A Combinatorial Shape Matching Algorithm for Rigid Protein Docking. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 285–296. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press (2009)

    Google Scholar 

  6. Cristobal, S., Zemla, A., Fischer, D., Rychlewski, L., Elofsson, A.: A study of quality measures for protein threading models. BMC Bioinformatics 2(5) (2001)

    Google Scholar 

  7. Jones, D.T., Taylor, W.R., Thornton, J.M.: A new approach to protein fold recognition. Nature 358, 86–89 (1992)

    Article  Google Scholar 

  8. Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 32(5), 922–923 (1976)

    Article  Google Scholar 

  9. Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 34(5), 827–828 (1978)

    Article  Google Scholar 

  10. Leszek, R., Daniel, F., Arne, E.: Livebench-6: large-scale automated evaluation of protein structure prediction servers. Proteins 53(suppl. 6), 542–547 (2003)

    Google Scholar 

  11. Li, S.C., Bu, D., Xu, J., Li, M.: Finding nearly optimal GDT scores. J. Comput. Biol. 18(5), 693–704 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Siew, N., Elofsson, A., Rychlewski, L., Fischer, D.: Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9), 776–785 (2000)

    Article  Google Scholar 

  13. Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J. Mol. Biol. 268(1), 209–225 (1997)

    Article  Google Scholar 

  14. Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 376–380 (1991)

    Article  Google Scholar 

  15. Wu, S., Skolnick, J., Zhang, Y.: Ab initio modeling of small proteins by iterative tasser simulations. BMC Biology 5(17) (2007)

    Google Scholar 

  16. Zemla, A.: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Research 31(13), 3370–3374 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ng, Y.K., Ono, H., Ge, L., Li, S.C. (2012). Finding Longest Common Segments in Protein Structures in Nearly Linear Time. In: Kärkkäinen, J., Stoye, J. (eds) Combinatorial Pattern Matching. CPM 2012. Lecture Notes in Computer Science, vol 7354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31265-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31265-6_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31264-9

  • Online ISBN: 978-3-642-31265-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics