Minimum Message Length Grouping of Ordered Data

Fitzgibbon, Leigh J.; Allison, Lloyd; Dowe, David L.

doi:10.1007/3-540-40992-0_5

Leigh J. Fitzgibbon⁴,
Lloyd Allison⁴ &
David L. Dowe⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1968))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

448 Accesses
8 Citations

Abstract

Explicit segmentation is the partitioning of data into homogeneous regions by specifying cut-points. W. D. Fisher (1958) gave an early example of explicit segmentation based on the minimisation of squared error. Fisher called this the grouping problem and came up with a polynomial time Dynamic Programming Algorithm (DPA). Oliver, Baxter and colleagues (1996, 1997, 1998) have applied the informationtheoretic Minimum Message Length (MML) principle to explicit segmentation. They have derived formulas for specifying cut-points imprecisely and have empirically shown their criterion to be superior to other segmentation methods (AIC, MDL and BIC). We use a simple MML criterion and Fisher’s DPA to perform numerical Bayesian (summing and) integration (using message lengths) over the cut-point location parameters. This gives an estimate of the number of segments, which we then use to estimate the cut-point positions and segment parameters by minimising the MML criterion. This is shown to have lower Kullback-Leibler distances on generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Csaki, editors, Proceeding 2nd International Symposium on Information Theory, pages 267–281. Akademia Kiado, Budapest, 1973.
Google Scholar
R. A. Baxter and J. J. Oliver. MDL and MML: Similarities and differences. Technical report TR 207, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia, 1994.
Google Scholar
R. A. Baxter and J. J. Oliver. The kindest cut: minimum message length segmentation. In S. Arikawa and A. K. Sharma, editors, Proc. 7th Int. Workshop on Algorithmic Learning Theory, volume 1160 of LCNS, pages 83–90. Springer-Verlag Berlin, 1996.
Google Scholar
J.H. Conway and N.J.A Sloane. Sphere Packings, Lattices and Groups. Springer-Verlag, London, 1988.
Google Scholar
D. L. Dowe, R. A. Baxter, J. J. Oliver, and C. S. Wallace. Point estimation using the Kullback-Leibler loss function and MML. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD98), volume 1394 of LNAI, pages 87–95, 1998.
Google Scholar
D. L. Dowe, J. J. Oliver, and C. S. Wallace. MML estimation of the parameters of the spherical Fisher distribution. In S. Arikawa and A. K. Sharma, editors, Proc. 7th Int. Workshop on Algorithmic Learning Theory, volume 1160 of LCNS, pages 213–227. Springer-Verlag Berlin, 1996.
Google Scholar
T. Edgoose and L. Allison. MML markov classification of sequential data. Statistics and Computing, 9:269–278, 1999.
Article Google Scholar
W. D. Fisher. On grouping for maximum homogeneity. Jrnl. Am. Stat. Soc., 53:789–798, 1958.
MATH Google Scholar
R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90(430):773–795, 1995.
Article MATH Google Scholar
J. J. Oliver, R. A. Baxter, and C. S. Wallace. Minimum message length segmentation. In X. Wu, R. Kotagiri, and K. Korb, editors, Research and Development in Knowledge Discovery and Data Mining (PAKDD-98), pages 83–90. Springer, 1998.
Google Scholar
J. J. Oliver and C. S. Forbes. Bayesian approaches to segmenting a simple time series. Technical Report 97/336, Dept. Computer Science, Monash University, Australia 3168, December 1997.
Google Scholar
J. J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.
Article MATH Google Scholar
J. J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2):416–431, 1983.
Article MATH MathSciNet Google Scholar
J. J. Rissanen. Hypothesis selection and testing by the MDL principle. Computer Jrnl., 42(4):260–269, 1999.
Article MATH MathSciNet Google Scholar
G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461–464, 1978.
Article MATH MathSciNet Google Scholar
S. Sclove. Time-series segmentation: A model and a method. Information Sciences, 29:7–25, 1983.
Article MATH Google Scholar
M. Viswanathan, C.S. Wallace, D.L. Dowe, and K. Korb. Finding cutpoints in noisy binary sequences-a revised empirical evaluation. In 12th Australian Joint Conference onArtificial Intelligence, 1999. A sequel has been submitted to Machine Learning Journal.
Google Scholar
C. S. Wallace and D. M. Boulton. An information measure for classiifcation. Computer Jrnl., 11(2):185–194, August 1968.
MATH Google Scholar
C. S. Wallace and D. L. Dowe. Minimum message length and Kolmogorov complexity. Computer Jrnl., 42(4):270–283, 1999.
Article MATH Google Scholar
C. S. Wallace and D. L. Dowe. Rejoinder. Computer Jrnl., 42(4):345–357, 1999.
Article MATH Google Scholar
C. S. Wallace and D. L. Dowe. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 10:73–83, 2000.
Article Google Scholar
C. S. Wallace and P. R. Freeman. Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B, 49:240–265, 1987.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Software Engineering, Monash University, VIC 3168, Clayton, Australia
Leigh J. Fitzgibbon, Lloyd Allison & David L. Dowe

Authors

Leigh J. Fitzgibbon
View author publications
You can also search for this author in PubMed Google Scholar
Lloyd Allison
View author publications
You can also search for this author in PubMed Google Scholar
David L. Dowe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Kyushu University, Hakozaki 6-10-1, 812-8581, Fukuoka, Japan
Hiroki Arimura
School of Computing, National University of Singapore, 3 Science Drive 2, 117543, Singapore, Singapore
Sanjay Jain
School of Computer Science and Engineering, The University of New South Wales, 2052, Sydney, Australia
Arun Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fitzgibbon, L.J., Allison, L., Dowe, D.L. (2000). Minimum Message Length Grouping of Ordered Data. In: Arimura, H., Jain, S., Sharma, A. (eds) Algorithmic Learning Theory. ALT 2000. Lecture Notes in Computer Science(), vol 1968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40992-0_5

Download citation

DOI: https://doi.org/10.1007/3-540-40992-0_5
Published: 19 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41237-3
Online ISBN: 978-3-540-40992-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics