Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3176))

Included in the following conference series:

Abstract

We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modeling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, state-space models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental concepts in graphical models, and inference algorithms on graphs. This is followed by a quick tour of approximate Bayesian inference, including Markov chain Monte Carlo (MCMC), Laplace approximation, BIC, variational approximations, and expectation propagation (EP). The aim of this chapter is to provide a high-level view of the field. Along the way, many state-of-the-art ideas and future directions are also reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  2. Jaynes, E.T.: Probability Theory: The Logic of Science (Edited by G. Larry Bretthorst). Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  3. Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Computation 7, 219–269 (1995)

    Article  Google Scholar 

  4. Green, P.J.: Penalized likelihood. In: Encyclopedia of Statistical Sciences, Update vol. 2 (1998)

    Google Scholar 

  5. Roweis, S.T., Ghahramani, Z.: A unifying review of linear Gaussian models. Neural Computation 11, 305–345 (1999)

    Article  Google Scholar 

  6. Roweis, S.T.: EM algorithms for PCA and SPCA. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10. The MIT Press, Cambridge (1998)

    Google Scholar 

  7. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B 61, 611–622 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Salakhutdinov, R., Roweis, S.T., Ghahramani, Z.: Optimization with EM and Expectation-Conjugate-Gradient. In: International Conference on Machine Learning (ICML 2003), pp. 672–679 (2003)

    Google Scholar 

  9. Shumway, R.H., Stoffer, D.S.: An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Analysis 3, 253–264 (1982)

    Article  MATH  Google Scholar 

  10. Ghahramani, Z., Hinton, G.E.: The EM algorithm for mixtures of factor analyzers. University of Toronto, Technical Report CRG-TR-96-1 (1996)

    Google Scholar 

  11. Hinton, G.E., Dayan, P., Revow, M.: Modeling the manifolds of images of handwritten digits. IEEE Trans. Neural Networks 8, 65–74 (1997)

    Article  Google Scholar 

  12. Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Computation 11, 443–482 (1999)

    Article  Google Scholar 

  13. Ghahramani, Z., Roweis, S.T.: Learning nonlinear dynamical systems using an EM algorithm. In: NIPS, vol. 11, pp. 431–437 (1999)

    Google Scholar 

  14. Handschin, J.E., Mayne, D.Q.: Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. International Journal of Control 9, 547–559 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gordon, N.J., Salmond, D.J., Smith, A.F.M.: A novel approach to nonlinear/non-Gaussian Bayesian state space estimation. IEEE Proceedings F: Radar and Signal Processing 140, 107–113 (1993)

    Google Scholar 

  16. Kanazawa, K., Koller, D., Russell, S.J.: Stochastic simulation algorithms for dynamic probabilistic networks. In: Besnard, P., Hanks, S. (eds.) Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference, pp. 346–351. Morgan Kaufmann Publishers, San Francisco (1995)

    Google Scholar 

  17. Kitagawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. of Computational and Graphical Statistics 5, 1–25 (1996)

    MathSciNet  Google Scholar 

  18. Isard, M., Blake, A.: Condensation – conditional density propagation for visual tracking (1998)

    Google Scholar 

  19. Doucet, A., de Freitas, J.F.G., Gordon, N.J.: Sequential Monte Carlo Methods in Practice. Springer, New York (2000)

    MATH  Google Scholar 

  20. Anderson, B.D.O., Moore, J.B.: Optimal Filtering. Prentice-Hall, Englewood Cliffs (1979)

    MATH  Google Scholar 

  21. Julier, S.J., Uhlmann, J.K.: A new extension of the Kalman filter to nonlinear systems. In: Int. Symp. Aerospace/Defense Sensing, Simulation and Controls (1997)

    Google Scholar 

  22. Wan, E.A., van der Merwe, R., Nelson, A.T.: Dual estimation and the unscented transformation. In: NIPS, vol. 12, pp. 666–672 (2000)

    Google Scholar 

  23. Minka, T.P.: Expectation propagation for approximate Bayesian inference. In: Uncertainty in Artificial Intelligence: Proceedings of the Seventeenth Conference (UAI 2001), pp. 362–369. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  24. Neal, R.M., Beal, M.J., Roweis, S.T.: Inferring state sequences for non-linear systems with embedded hidden Markov models. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)

    Google Scholar 

  25. Bishop, C.M., Svensen, M., Williams, C.K.I.: GTM: The generative topographic mapping. Neural Computation 10, 215–234 (1998)

    Article  MATH  Google Scholar 

  26. Shepard, R.N.: The analysis of proximities: multidimensional scaling with an unknown distance function i and ii. Psychometrika 27, 125–139, 219–246 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27, 115–129 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  28. Hastie, T., Stuetzle, W.: Principle curves. Journal of the American Statistical Association 84, 502–516 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  29. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  30. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  31. Ghahramani, Z., Jordan, M.I.: Factorial hidden Markov models. Machine Learning 29, 245–273 (1997)

    Article  MATH  Google Scholar 

  32. Murphy, K.P.: Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, UC Berkeley, Computer Science Division (2002)

    Google Scholar 

  33. Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169 (1985)

    Article  Google Scholar 

  34. Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268, 1158–1161 (1995)

    Article  Google Scholar 

  35. Karklin, Y., Lewicki, M.S.: Learning higher-order structures in natural images. Network: Computation in Neural Systems 14, 483–499 (2003)

    Article  Google Scholar 

  36. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)

    MATH  Google Scholar 

  37. Besag, J.: Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Ser. B 6, 192–236 (1974)

    MathSciNet  MATH  Google Scholar 

  38. Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence 42, 393–405 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  39. Weiss, Y.: Correctness of local probability propagation in graphical models with loops. Neural Computation 12, 1–41 (2000)

    Article  Google Scholar 

  40. Weiss, Y., Freeman, W.T.: On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, Special Issue on Codes on Graphs and Iterative Algorithms 47 (2001)

    Google Scholar 

  41. Gallager, R.G.: Low-Density Parity-Check Codes. MIT Press, Cambridge (1963)

    MATH  Google Scholar 

  42. Berrou, C., Glavieux, A., Thitimajshima, P.: Near shannon limit error-correcting coding and decoding: Turbo-codes (1). In: Proc. ICC 1993, pp. 1064–1070 (1993)

    Google Scholar 

  43. McEliece, R.J., MacKay, D.J.C., Cheng, J.F.: Turbo decoding as an instance of Pearl’s Belief Propagation algorithm. IEEE Journal on Selected Areas in Communications 16, 140–152 (1998)

    Article  Google Scholar 

  44. MacKay, D.J.C., Neal, R.M.: Good error-correcting codes based on very sparse matrices. IEEE Transactions on Information Theory 45, 399–431 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  45. Yedidia, J.S., Freeman, W.T., Weiss, Y.: Generalized belief propagation. In: NIPS 13. MIT Press, Cambridge (2001)

    Google Scholar 

  46. Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, 157–224 (1988)

    Google Scholar 

  47. Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a k-tree. SIAM Journal of Algebraic and Discrete Methods 8, 277–284 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  48. Kjaerulff, U.: Triangulation of graphs—algorithms giving small total state space (1990)

    Google Scholar 

  49. Heckerman, D.: A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research (1996)

    Google Scholar 

  50. Murray, I., Ghahramani, Z.: Bayesian learning in undirected graphical models: Approximate MCMC algorithms. In: Proceedings of UAI (2004)

    Google Scholar 

  51. Neal, R.M.: Connectionist learning of belief networks. Artificial Intelligence 56, 71–113 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  52. Elidan, G., Lotner, N., Friedman, N., Koller, D.: Discovering hidden variables: A structure-based approach. In: Advances in Neural Information Processing Systems (NIPS) (2001)

    Google Scholar 

  53. Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Technical report, Department of Computer Science, University of Toronto (1993)

    Google Scholar 

  54. Beal, M.J., Ghahramani, Z.: The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In: Bernardo, J.M., Dawid, A.P., Berger, J.O., West, M., Heckerman, D., Bayarri, M.J. (eds.) Bayesian Statistics, vol. 7. Oxford University Press, Oxford (2003)

    Google Scholar 

  55. Heckerman, D., Chickering, D.M.: A comparison of scientific and engineering criteria for Bayesian model selection (1996)

    Google Scholar 

  56. Friedman, N.: The Bayesian structural EM algorithm. In: Proc. Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI 1998). Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  57. Moore, A., Wong, W.K.: Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 552–559. AAAI Press, Menlo Park (2003)

    Google Scholar 

  58. Friedman, N., Koller, D.: Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning 50, 95–126 (2003)

    Article  MATH  Google Scholar 

  59. Jefferys, W., Berger, J.: Ockham’s razor and Bayesian analysis. American Scientist 80, 64–72 (1992)

    Google Scholar 

  60. MacKay, D.J.C.: Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems 6, 469–505 (1995)

    Article  MATH  Google Scholar 

  61. Rasmussen, C.E., Ghahramani, Z.: Occam’s razor. In: Advances in Neural Information Processing Systems 13. MIT Press, Cambridge (2001)

    Google Scholar 

  62. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods in graphical models. Machine Learning 37, 183–233 (1999)

    Article  MATH  Google Scholar 

  63. Winn, J.: Variational Message Passing and its Applications. PhD thesis, Department of Physics, University of Cambridge (2003)

    Google Scholar 

  64. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley, Dept. of Statistics (2003)

    Google Scholar 

  65. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  66. Neal, R.M., Hinton, G.E.: A new view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models. Kluwer Academic Press, Dordrecht (1998)

    Google Scholar 

  67. Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In: Proc. 15th Conf. on Uncertainty in Artificial Intelligence (1999)

    Google Scholar 

  68. Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: Advances in Neural Information Processing Systems 13. MIT Press, Cambridge (2001)

    Google Scholar 

  69. Minka, T.P.: A family of algorithms for approximate Bayesian inference. PhD thesis, MIT (2001)

    Google Scholar 

  70. Minka, T.P.: The EP energy function and minimization schemes. Technical report (2001)

    Google Scholar 

  71. Seeger, M.: Learning with labeled and unlabeled data. Technical report, University of Edinburgh (2001)

    Google Scholar 

  72. Szummer, M., Jaakkola, T.S.: Partially labeled classification with Markov random walks. In: NIPS (2001)

    Google Scholar 

  73. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: The Twentieth International Conference on Machine Learning (ICML 2003) (2003)

    Google Scholar 

  74. Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Machine Learning 56, 209–239 (2004)

    Article  MATH  Google Scholar 

  75. Kemp, C., Griffiths, T.L., Stromsten, S., Tenenbaum, J.B.: Semi-supervised learning with trees. In: NIPS, vol. 16 (2004)

    Google Scholar 

  76. Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics 2, 1152–1174 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  77. Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. In: Recent Advances in Statistics, pp. 287–302. Academic Press, New York (1983)

    Chapter  Google Scholar 

  78. Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, 577–588 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  79. Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, 249–265 (2000)

    MathSciNet  Google Scholar 

  80. Rasmussen, C.E.: The infinite Gaussian mixture model. Adv. Neur. Inf. Proc. Sys. 12, 554–560 (2000)

    Google Scholar 

  81. Blei, D., Jordan, M.I.: Variational methods for the Dirichlet process. In: Proceedings of the 21st International Conference on Machine Learning (2004)

    Google Scholar 

  82. Minka, T.P., Ghahramani, Z.: Expectation propagation for infinite mixtures. Technical report, Presented at NIPS 2003 Workshop on Nonparametric Bayesian Methods and Infinite Models (2003)

    Google Scholar 

  83. Beal, M., Ghahramani, Z., Rasmussen, C.: The infinite hidden Markov model. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2001)

    Google Scholar 

  84. Neal, R.M.: Density modeling and clustering using Dirichlet diffusion trees. In: Bernardo, J.M., et al. (eds.) Bayesian Statistics, vol. 7, pp. 619–629 (2003)

    Google Scholar 

  85. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Technical Report 653, Department of Statistics, University of California at Berkeley (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ghahramani, Z. (2004). Unsupervised Learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds) Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science(), vol 3176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28650-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28650-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23122-6

  • Online ISBN: 978-3-540-28650-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics