Minimum message length segmentation

Oliver, Jonathan J.; Baxter, Rohan A.; Wallace, Chris S.

doi:10.1007/3-540-64383-4_19

Minimum message length segmentation

Jonathan J. Oliver⁹,
Rohan A. Baxter¹⁰ &
Chris S. Wallace⁹

Papers
Conference paper
First Online: 01 January 2005

1698 Accesses
21 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1394))

Abstract

The segmentation problem arises in many applications in data mining, A.I. and statistics, including segmenting time series, decision tree algorithms and image processing. In this paper, we consider a range of criteria which may be applied to determine if some data should be segmented into two or regions. We develop a information theoretic criterion (MML) for the segmentation of univariate data with Gaussian errors. We perform simulations comparing segmentation methods (MML, AIC, MDL and BIC) and conclude that the MML criterion is the preferred criterion. We then apply the segmentation method to financial time series data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Akaike. Information theory and an extension of the maximum likelihood principle. In B.N. Petrov and F. Csaki, editors, Proc. of the 2nd International Symposium on Information Theory, pages 267–281, 1973.
Google Scholar
R.A. Baxter and J.J. Oliver. The kindest cut: minimum message length segmentation. In S. Arikawa and A. Sharma, editors, Lecture Notes in Artificial Intelligence 1160, Algorithmic Learning Theory, ALT-96, pages 83–90, 1996.
Google Scholar
J.H. Conway and N.J.A Sloane. Sphere Packings, Lattices and Groups. Springer-Verlag, London, 1988.
Book MATH Google Scholar
B. Dom. MDL estimation with Small Sample Sizes including an application to the problem of segmenting binary strings using bernoulli models. Technical Report RJ 9997 (89085) 12/15/95, IBM Research Division, Almaden Research Center, 650 Harry Rd, San Jose, CA, 95120–6099, 1995.
Google Scholar
G. Koop and S.M. Potter. Bayes Factors and nonlinearity: Evidence from economic time series. UCLA Working Paper, August 1995, submitted to Journal of Econometrics.
Google Scholar
Mengxiang Li. Minimum description length based 2-D shape description. In IEEE 4th Int. Conf. on Computer Vision, pages 512–517, May 1992.
Google Scholar
Z. Liang, R.J. Jaszczak, and R.E. Coleman. Parameter estimation of finite mixtures using the EM algorithm and information criteria with applications to medical image processing. IEEE Trans. on Nuclear Science, 39(4):1126–1133, 1992.
Article Google Scholar
J.J. Oliver and D.J. Hand. Introduction to minimum encoding inference. Technical report TR 4-94, Dept. of Statistics, Open University, Walton Hall, Milton Keynes, MK7 6AA, UK, 1994. Available on the WWW from http://www.cs.monash.edu.au/~jono.
Google Scholar
J.J. Oliver, Baxter R.A., and Wallace C.S. Unsupervised Learning using MML. In Machine Learning: Proc. of the Thirteenth International Conference (ICML 96), pages 364–372. Morgan Kaufmann Publishers, San Francisco, CA, 1996. Available on the WWW from http://www.cs.monash.edu.au/~jono.
Google Scholar
B. Pfahringer. Compression-based discretization of continuous attributes. In Machine Learning: Proc. of the Twelfth International Workshop, pages 456–463, 1995.
Google Scholar
J.R. Quinlan. Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence, 4:77–90, 1996.
MATH Google Scholar
J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.
Article MATH Google Scholar
G. Schwarz. Estimating dimension of a model. Ann. Stat., 6:461–464, 1978.
Article MathSciNet MATH Google Scholar
S.L. Sclove. On segmentation of time series. In S. Karlin, T. Amemiya, and L. Goodman, editors, Studies in econometrics, time series, and multivariate statistics, pages 311–330. Academic Press, 1983.
Google Scholar
C.W. Therrien. Decision, estimation, and classification: an introduction to pattern recognition and related topics. Wiley, New York, 1989.
MATH Google Scholar
H. Tong. Non-linear time series: a dynamical system approach. Clarendon Press, Oxford, 1990.
MATH Google Scholar
C.S. Wallace and D.M. Boulton. An information measure for classification. Computer Journal, 11:185–194, 1968.
Article MATH Google Scholar
C.S. Wallace and P.R. Freeman. Estimation and inference by compact coding. Journal of the Royal Statistical Society (Series B), 49:240–252, 1987.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Computer Science, Monash University, Clayton, Vic., Australia
Jonathan J. Oliver & Chris S. Wallace
Ultimode Systems, 2560 Bancroft Way #213, 94704, Berkeley, CA, USA
Rohan A. Baxter

Authors

Jonathan J. Oliver
View author publications
You can also search for this author in PubMed Google Scholar
Rohan A. Baxter
View author publications
You can also search for this author in PubMed Google Scholar
Chris S. Wallace
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Software Engineering, Monash university, 900 Dandenong Road, Caulfield East, Victoria, 3145, Australia
Xindong Wu
Department of Computer Science, The University of Melbourne, Parkville, Victoria, 3052, Australia
Ramamohanarao Kotagiri
School of Computer Science and Engineering, Monash university, Clayton, Victoria, 3168, Australia
Kevin B. Korb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oliver, J.J., Baxter, R.A., Wallace, C.S. (1998). Minimum message length segmentation. In: Wu, X., Kotagiri, R., Korb, K.B. (eds) Research and Development in Knowledge Discovery and Data Mining. PAKDD 1998. Lecture Notes in Computer Science, vol 1394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64383-4_19

Download citation

DOI: https://doi.org/10.1007/3-540-64383-4_19
Published: 25 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64383-8
Online ISBN: 978-3-540-69768-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics