MML Markov classification of sequential data

Edgoose, T.; Allison, L.

doi:10.1023/A:1008907921792

MML Markov classification of sequential data

Published: November 1999

Volume 9, pages 269–278, (1999)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

T. Edgoose &
L. Allison

130 Accesses
Explore all metrics

Abstract

General purpose un-supervised classification programs have typically assumed independence between observations in the data they analyse. In this paper we report on an extension to the MML classifier Snob which enables the program to take advantage of some of the extra information implicit in ordered datasets (such as time-series). Specifically the data is modelled as if it were generated from a first order Markov process with as many states as there are classes of observation. The state of such a process at any point in the sequence determines the class from which the corresponding observation is generated. Such a model is commonly referred to as a Hidden Markov Model. The MML calculation for the expected length of a near optimal two-part message stating a specific model of this type and a dataset given this model is presented. Such an estimate enables us to fairly compare models which differ in the number of classes they specify which in turn can guide a robust un-supervised search of the model space. The new program, tSnob, is tested against both ‘synthetic’ data and a large ‘real world’ dataset and is found to make unbiased estimates of model parameters and to conduct an effective search of the extended model space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Laird, N. M., Dempster, A. P. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (Series B), 39, 1–22.
Google Scholar
Soules, G., Baum, L. E., Petrie, T. and Weiss N. (1970) A maximisation technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164–171.
Google Scholar
Cheeseman, P. C. (1988) Autoclass II conceptual clustering system. Proceedings Machine Learning Conference, pp. 54–64.
Edgoose, T., Allison, L. and Dowe, D. L. (1998) An MML Classification of Protein Structure that knows about Angles and Sequence. Forthcoming in the Proceedings of the 3rd Pacific Symposium on Biocomputing.
Fisher, N. I. (1993) em Statistical Analysis of Circular Data. Cambridge University Press, Cambridge.
Google Scholar
Leroux, B. G. and Puterman, M. L. (1992) Maximum-PenalizedLikelihood Estimation for Independent and Markov Dependent Mixture Models. Biometrics, 48, 545–558.
Google Scholar
Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Google Scholar
Rissanen, J. (1987) Stochastic complexity. Journal of the Royal Statistical Society (Series B), 49, 223–239.
Google Scholar
Wallace, C. S. (1990) Classification by minimum length inference. AAAI Spring Symposium on the Theory and Application of Minimum Length Encoding, Standford, pp. 5–9.
Wallace, C. S. and Boulton, D. M. (1968) An information measure for classification. Computer Journal, 11, 185–194.
Google Scholar
Wallace, C. S. and Freeman, P. R. (1987) Estimation and inference by compact coding. Journal of the Royal Statistical Society (Series B), 49, 240–252.
Google Scholar
Wallace, C. S. and Dowe, D. L. (1994) Estimation of the von Mises concentration parameter using Minimum Message Length. In proceedings of the Twelfth Australian Statistical Society Conference, Monash University, Melbourne, Australia.
Google Scholar

Download references

Authors

T. Edgoose
View author publications
You can also search for this author in PubMed Google Scholar
L. Allison
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Edgoose, T., Allison, L. MML Markov classification of sequential data. Statistics and Computing 9, 269–278 (1999). https://doi.org/10.1023/A:1008907921792

Download citation

Issue Date: November 1999
DOI: https://doi.org/10.1023/A:1008907921792

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MML Markov classification of sequential data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-supervised Multivariate Sequential Pattern Mining

Classification error in multiclass discrimination from Markov data

Omen: discovering sequential patterns with reliable prediction delays

References

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

MML Markov classification of sequential data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-supervised Multivariate Sequential Pattern Mining

Classification error in multiclass discrimination from Markov data

Omen: discovering sequential patterns with reliable prediction delays

Explore related subjects

References

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation