Abstract
In this paper, sequence unique reconstruction refers to the property that a sequence is uniquely reconstructable from all its K-tuples. We propose and study the phase transition behavior of the probability P(K) of unique reconstruction with regard to tuple size K in random sequences (iid model). Based on Monte Carlo experiments, artificial proteins generated from iid model exhibit a phase transition when P(K) abruptly jumps from a low value phase (e.g. < 0.1) to a high value phase (e.g. > 0.9). With a generalization to any alphabet, we prove that for a random sequence of length L, as L is large enough, P(K) undergoes a sharp phase transition when p ≤ 0.1015 where p = P (two random letters match). Besides, formulas are derived to estimate the transition points, which may be of practical use in sequencing DNA by hybridization. Concluded from our study, most proteins do not deviate greatly from random sequences in the sense of sequence unique reconstruction, while there are some “stubborn” proteins which only become uniquely reconstructable at a very large K and probably have biological implications.
Similar content being viewed by others
References
P. A. Pevzner (2000) Computational Molecular Biology: An Algorithmic Approach MIT Press Cambridge
P. A. Pevzner (1989) ArticleTitle l-tuple DNA sequencing: computer analysis Journal of Biomolecular Structure and Dynamics 7 63–73
H. Fleischner (1991) Eulerian Graphs and Related Topics, Part 1, Vol. 2. Annals of Discrete Mathematics 45 Elsevier Science Publishers B. V. Amsterdam-New York
X. L. Shi, H.M. Xie, S. Y. Zhang, and B. L. Hao, Decomposition and reconstruction of protein sequences: the problem of uniqueness and factorizable language, Compositional, Journal of Korean Physical Social, to appear in 2007, 50(1).
L. Kontorovich (2004) ArticleTitleUniquely decodable n-gram embeddings Theoretical Computer Science 329 271–284 Occurrence Handle10.1016/j.tcs.2004.10.010
Q. Li, H. M. Xie, Finite automata for testing uniqueness of Eulerian trails, arXive: cs.CC/0507052 (July, 2005).
M. S. Waterman (1995) Introduction to Computational Biology Chapman & Hall London
G. Reinert S. Schbath M. S. Waterman (2000) ArticleTitleProbabilistic and statistical properties of words: an overview Journal of Computational Biology 7 1–46 Occurrence Handle10.1089/10665270050081360
P. A. Pevzner Y. P. Lysov K. R. Khrapko A. V. Belyavsky V. L. Florentiev A. D. Mirzabekov (1991) ArticleTitleImproved chips for sequencing by hybridization Journal of Biomolecular Structure and Dynamics 9 399–410
M. Dyer A. Frieze S. Suen (1994) ArticleTitleThe probability of unique solutions of sequencing by hybridization Journal of Computational Biology 1 105–110 Occurrence Handle10.1089/cmb.1994.1.105
R. Arratia D. Martin G. Reinert M. S. Waterman (1996) ArticleTitlePoisson process approximation for sequence repeats and sequencing by hybridization Journal of Computational Biology 3 425–463 Occurrence Handle10.1089/cmb.1996.3.425
O. Weiss M. A. Jimenez H. Henzel (2000) ArticleTitleInformation content of protein sequences Journal of Theoretical Biology 206 379–386 Occurrence Handle10.1006/jtbi.2000.2138
R. Durbin S. Eddy A. Krogh G. Mitchison (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge University Press Cambridge, England
S. Kotz N. Balakrishnan N. L. Johnson (2000) Continuous Multivariate Distributions NumberInSeriesVol.1 John Wiley & Sons Inc. New York Occurrence Handle10.1002/0471722065
Author information
Authors and Affiliations
Corresponding author
Additional information
The two authors contributed to this work equally.
Rights and permissions
About this article
Cite this article
Xia, L., Zhou, C. Phase Transition in Sequence Unique Reconstruction. Jrl Syst Sci & Complex 20, 18–29 (2007). https://doi.org/10.1007/s11424-007-9001-x
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11424-007-9001-x