Skip to main content
Log in

Phase Transition in Sequence Unique Reconstruction

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

In this paper, sequence unique reconstruction refers to the property that a sequence is uniquely reconstructable from all its K-tuples. We propose and study the phase transition behavior of the probability P(K) of unique reconstruction with regard to tuple size K in random sequences (iid model). Based on Monte Carlo experiments, artificial proteins generated from iid model exhibit a phase transition when P(K) abruptly jumps from a low value phase (e.g. < 0.1) to a high value phase (e.g. > 0.9). With a generalization to any alphabet, we prove that for a random sequence of length L, as L is large enough, P(K) undergoes a sharp phase transition when p ≤ 0.1015 where p = P (two random letters match). Besides, formulas are derived to estimate the transition points, which may be of practical use in sequencing DNA by hybridization. Concluded from our study, most proteins do not deviate greatly from random sequences in the sense of sequence unique reconstruction, while there are some “stubborn” proteins which only become uniquely reconstructable at a very large K and probably have biological implications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P. A. Pevzner (2000) Computational Molecular Biology: An Algorithmic Approach MIT Press Cambridge

    Google Scholar 

  2. P. A. Pevzner (1989) ArticleTitle l-tuple DNA sequencing: computer analysis Journal of Biomolecular Structure and Dynamics 7 63–73

    Google Scholar 

  3. H. Fleischner (1991) Eulerian Graphs and Related Topics, Part 1, Vol. 2. Annals of Discrete Mathematics 45 Elsevier Science Publishers B. V. Amsterdam-New York

    Google Scholar 

  4. X. L. Shi, H.M. Xie, S. Y. Zhang, and B. L. Hao, Decomposition and reconstruction of protein sequences: the problem of uniqueness and factorizable language, Compositional, Journal of Korean Physical Social, to appear in 2007, 50(1).

  5. L. Kontorovich (2004) ArticleTitleUniquely decodable n-gram embeddings Theoretical Computer Science 329 271–284 Occurrence Handle10.1016/j.tcs.2004.10.010

    Article  Google Scholar 

  6. Q. Li, H. M. Xie, Finite automata for testing uniqueness of Eulerian trails, arXive: cs.CC/0507052 (July, 2005).

  7. M. S. Waterman (1995) Introduction to Computational Biology Chapman & Hall London

    Google Scholar 

  8. G. Reinert S. Schbath M. S. Waterman (2000) ArticleTitleProbabilistic and statistical properties of words: an overview Journal of Computational Biology 7 1–46 Occurrence Handle10.1089/10665270050081360

    Article  Google Scholar 

  9. P. A. Pevzner Y. P. Lysov K. R. Khrapko A. V. Belyavsky V. L. Florentiev A. D. Mirzabekov (1991) ArticleTitleImproved chips for sequencing by hybridization Journal of Biomolecular Structure and Dynamics 9 399–410

    Google Scholar 

  10. M. Dyer A. Frieze S. Suen (1994) ArticleTitleThe probability of unique solutions of sequencing by hybridization Journal of Computational Biology 1 105–110 Occurrence Handle10.1089/cmb.1994.1.105

    Article  Google Scholar 

  11. R. Arratia D. Martin G. Reinert M. S. Waterman (1996) ArticleTitlePoisson process approximation for sequence repeats and sequencing by hybridization Journal of Computational Biology 3 425–463 Occurrence Handle10.1089/cmb.1996.3.425

    Article  Google Scholar 

  12. O. Weiss M. A. Jimenez H. Henzel (2000) ArticleTitleInformation content of protein sequences Journal of Theoretical Biology 206 379–386 Occurrence Handle10.1006/jtbi.2000.2138

    Article  Google Scholar 

  13. R. Durbin S. Eddy A. Krogh G. Mitchison (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge University Press Cambridge, England

    Google Scholar 

  14. S. Kotz N. Balakrishnan N. L. Johnson (2000) Continuous Multivariate Distributions NumberInSeriesVol.1 John Wiley & Sons Inc. New York Occurrence Handle10.1002/0471722065

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chan Zhou.

Additional information

The two authors contributed to this work equally.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, L., Zhou, C. Phase Transition in Sequence Unique Reconstruction. Jrl Syst Sci & Complex 20, 18–29 (2007). https://doi.org/10.1007/s11424-007-9001-x

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-007-9001-x

Keywords

Navigation