Abstract
We present a method for segmenting qualitative sequences, according to a type of composition criteria whose definition and evaluation are founded on the notion of predictors and additive prediction. Given a set of predictors, a partition of a sequence can be precisely evaluated. We present a language for the declaration of predictors. One of the problems is to optimize the partition of a sequence into a given number of segments. The other problem is to obtain a suitable number of segments for the partitioning of the sequence. We present an algorithm which, given a sequence and a set of predictors, can successively compute the optimal partitions of the sequence for growing numbers of segments. The time- and space-complexity of the algorithm are linear for the length of sequence and number of predictors. Experimentally, the computed partitions are highly stable regard to the number of segments, and we present an application of this approach to the determination of the origins of replication of bacterial chromosomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
W.D. Fisher. On grouping for maximal homogeneity. Journal of the American Statistical Association, 53:789–798, 1958.
A.D. Gordon. Cluster validation. In C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.H. Bock, and Y. Baba, editors, Studies in Classification, Data Analysis, and Knowledge Organization: Data Science, Classification, and Related Methods, pages 22–39, Kobe, March 1996. IFCS, Springer-Verlag. http://www-solar.dcs.st-and.ac.uk/~allan/.
J.C. Gower. Maximal predictive classification. Biometrics, 30:643–654, 1974.
L. Guéguen, R. Vignes, and J. Lebbe. Maximal predictive clustering with order constraint: a linear and optimal algorithm. In A. Rizzi, M. Vichi, and H. Bock, editors, Advances in Data Science and Classification, pages 137–144. IFCS, Springer Verlag, July 1998.
D.M. Hawkins and D.F. Merriam. Optimal zonation of digitized sequential data. Mathematical Geology, 5(4):389–395, 1973.
J.R. Lobry. Asymmetric substitution patterns in the two dna strands of bacteria. Mol. Biol. Evol., 13(5):660–665, 1996.
E.P.C. Rocha, A. Danchin, and A. Viari. Universal replication biases in bacteria. Molecular Microbiology, 32(1):11–16, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guéguen, L. (2001). Segmentation by Maximal Predictive Partitioning According to Composition Biases. In: Gascuel, O., Sagot, MF. (eds) Computational Biology. JOBIM 2000. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45727-5_4
Download citation
DOI: https://doi.org/10.1007/3-540-45727-5_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42242-6
Online ISBN: 978-3-540-45727-5
eBook Packages: Springer Book Archive