Skip to main content

Single factor analysis in MML mixture modelling

  • Papers
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1394))

Abstract

Mixture modelling concerns the unsupervised discovery of clusters within data. Most current clustering algorithms assume that variables within classes are uncorrelated. We present a method for producing and evaluating models which account for inter-attribute correlation within classes with a single Gaussian linear factor. The method used is Minimum Message Length (MML), an invariant, information-theoretic Bayesian hypothesis evaluation criterion. Our work extends and unifies that of Wallace and Boulton (1968) and Wallace and Freeman (1992), concerned respectively with MML mixture modelling and MML single factor analysis. Results on simulated data are comparable to those of Wallace and Freeman (1992), outperforming Maximum Likelihood. We include an application of mixture modelling with single factors on spectral data from the Infrared Astronomical Satellite. Our model shows fewer unnecessary classes than that produced by AutoClass (Goebel et. al. 1989) due to the use of factors in modelling correlation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Beichman, H.J. Neugebauer, P.E. Clegg, and Y.J. Chester. IRAS catalogs and atlases: Explanatory supplement, 1988.

    Google Scholar 

  2. P. Cheeseman, M. Self, J. Kelly, W. Taylor, D. Freeman, and J. Stutz. Bayesian classification. In Seventh National Conference on Artificial Intelligence, pages 607–611, Saint Paul, Minnesota, 1988.

    Google Scholar 

  3. D.L. Dowe, L. Allison, T.I. Dix, L. Hunter, C.S. Wallace, and T. Edgoose. Circular clustering of protein dihedral angles by Minimum Message Length. In Proc. 1st Pacific Symp. Biocomp., pages 242–255, HI, U.S.A., 1996.

    Google Scholar 

  4. D.L. Dowe, R.A. Baxter, J.J. Oliver, and C.S. Wallace. Point Estimation using the Kullback-Leibler Loss Function and MML. In Proc. 2nd Pacific Asian Conference on Knowledge Discovery and Data Mining (PAKDD'98), Melbourne, Australia, April 1998. Springer Verlag. accepted, to appear.

    Google Scholar 

  5. D.L. Dowe and C.S. Wallace. Resolving the Neyman-Scott problem by Minimum Message Length. In Proc. Computing Science and Statistics — 28th Symposium on the interface, volume 28, pages 614–618, 1997.

    Google Scholar 

  6. J. Goebel, K. Volk, H. Walker, F. Gerbault, P. Cheeseman, M. Self, J. Stutz, and W. Taylor. A bayesian classification of the IRAS LRS Atlas. Astron. Astrophys., (225):L5–L8, 1989.

    Google Scholar 

  7. H.H. Harman. Modern Factor Analysis. University of Chicago Press, USA, 2nd edition, 1967.

    MATH  Google Scholar 

  8. Hinton, Dayan, and Revow. Modelling the manifolds of images of handwritten digits. IEEE Transactions on Neural Networks, 8(1):65–74, 1997.

    Article  Google Scholar 

  9. G.E. Hinton and R. Zemel. Autoencoders, minimum description length and Helmholz free energy. In Cowan et. al, editor, Advances in Neural Information Processing Systems, 1994.

    Google Scholar 

  10. G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, New York, 1996.

    MATH  Google Scholar 

  11. J. Neyman and E.L. Scott. Consistent estimates based on partially consistent observations. Econometrika, 16:1–32, 1948.

    Article  MathSciNet  MATH  Google Scholar 

  12. F.M. Olnon and E. Raimond. IRAS catalogs and atlases: Atlas of low resolution spectra, 1986.

    Google Scholar 

  13. D.M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Inc., 1985.

    Google Scholar 

  14. C.S. Wallace. Classification by Minimum Message Length inference. In G. Coos and J. Hartmanis, editors, Advances in Computing and Information — ICCI '90, pages 72–81. Springer-Verlag, Berlin, 1990.

    Chapter  Google Scholar 

  15. C.S. Wallace. Multiple Factor Analysis by MML Estimation. Technical Report 95218, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia, 1995. submitted to J. Multiv. Analysis.

    Google Scholar 

  16. C.S. Wallace and D.M. Boulton. An information measure for classification. Computer Journal, 11:185–194, 1968.

    Article  MATH  Google Scholar 

  17. C.S. Wallace and D.L. Dowe. MML estimation of the von Mises concentration parameter. Technical report TR 93193, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia, 1993. prov. accepted, Aust. J. Stat.

    Google Scholar 

  18. C.S. Wallace and D.L. Dowe. MML mixture modelling of Multi-state, Poisson, von Mises circular and Gaussian distributions. In Proc. 6th Int. Workshop on Artif. Intelligence and Stat., pages 529–536, 1997. An abstract is in Proc. Sydney Int. Stat. Congr. (SISC-96, 1996), p197; and also in IMS Bulletin (1996), 25 (4), p410.

    Google Scholar 

  19. C.S. Wallace and P.R. Freeman. Estimation and inference by compact coding. Journal of the Royal Statistical Society (Series B), 49:240–252, 1987.

    MathSciNet  MATH  Google Scholar 

  20. C.S. Wallace and P.R. Freeman. Single factor analysis by MML estimation. Journal of the Royal Statistical Society (Series B), 54:195–209, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Edwards, R.T., Dowe, D.L. (1998). Single factor analysis in MML mixture modelling. In: Wu, X., Kotagiri, R., Korb, K.B. (eds) Research and Development in Knowledge Discovery and Data Mining. PAKDD 1998. Lecture Notes in Computer Science, vol 1394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64383-4_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-64383-4_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64383-8

  • Online ISBN: 978-3-540-69768-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics