Skip to main content

Mixtures of Product Components Versus Mixtures of Dependence Trees

  • Conference paper
  • First Online:
Computational Intelligence (IJCCI 2014)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 620))

Included in the following conference series:

  • 520 Accesses

Abstract

Mixtures of product components assume independence of variables given the index of the component. They can be efficiently estimated from data by means of EM algorithm and have some other useful properties. On the other hand, by considering mixtures of dependence trees, we can explicitly describe the statistical relationship between pairs of variables at the level of individual components and therefore approximation power of the resulting mixture may essentially increase. However, we have found in application to classification of numerals that both models perform comparably and the contribution of dependence-tree structures to the log-likelihood criterion decreases in the course of EM iterations. Thus the optimal estimate of dependence-tree mixture tends to reduce to a simple product mixture model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Borůvka, O.: On a minimal problem. Trans. Moravian Soc. Nat. Sci. (in Czech) No. 3 (1926)

    Google Scholar 

  2. Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533–1543 (2004)

    Article  Google Scholar 

  3. Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory IT-14(3), 462–467 (1968)

    Article  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, l–38 (1977)

    MathSciNet  Google Scholar 

  5. Day, N.E.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)

    Article  MathSciNet  Google Scholar 

  6. Grim, J.: On numerical evaluation of maximum—likelihood estimates for finite mixtures of distributions. Kybernetika l8(3), 173–190 (1982). http://dml.cz/dmlcz/124132

  7. Grim, J.: On structural approximating multivariate discrete probability distributions. Kybernetika 20(1), 1–17 (1984). http://dml.cz/dmlcz/125676

  8. Grim, J.: Multivariate statistical pattern recognition with nonreduced dimensionality. Kybernetika 22(2), 142–157 (1986). http://dml.cz/dmlcz/125022

  9. Grim, J.: Design of multilayer neural networks by information preserving transforms. In: Pessa, E., Penna, M.P., Montesanto A. (eds.) Third European Congress on Systems Science. Edizioni Kappa, Roma, pp. 977–982 (1996)

    Google Scholar 

  10. Grim, J.: Information approach to structural optimization of probabilistic neural networks. In: Ferrer, L. et al. (eds.) Proceedings of the 4th System Science European Congress. Valencia, Soc. Espanola de Sistemas Generales, pp. 527–540 (1999)

    Google Scholar 

  11. Grim, J.: A sequential modification of EM algorithm. In: Gaul, W., Locarek-Junge, H. (eds.) Studies in Classification, Data Analysis and Knowledge Organization, pp. 163–170. Springer (1999)

    Google Scholar 

  12. Grim, J.: EM cluster analysis for categorical data. In: Yeung, D.Y., Kwok, J.T., Fred, A. (eds.) Structural, Syntactic and Statistical Pattern Recognition, LNCS 4109, pp. 640–648. Springer, Berlin (2006)

    Google Scholar 

  13. Grim, J.: Neuromorphic features of probabilistic neural networks. Kybernetika 43(5), 697–712 (2007). http://dml.cz/dmlcz/135807

  14. Grim, J.: Sequential pattern recognition by maximum conditional informativity. Pattern Recogn. Lett. 45C, 39–45 (2014). doi:10.1016/j.patrec.2014.02.024

    Article  Google Scholar 

  15. Grim, J.: Approximating probability densities by mixtures of gaussian dependence trees. In: Hobza, T. (ed.) Proceedings of the SPMS 2014 Stochastic and Physical Monitoring Systems, pp. 43–56. Czech Technical University Prague (2014)

    Google Scholar 

  16. Grim, J., Hora, J.: Iterative principles of recognition in probabilistic neural networks. Neural Netw. 21(6), 838–846 (2008)

    Article  Google Scholar 

  17. Grim, J., Hora, J.: Computational properties of probabilistic neural networks. In: Artificial Neural Networks—ICANN 2010 Part II, LNCS 5164, pp. 52–61. Springer, Berlin (2010)

    Chapter  Google Scholar 

  18. Grim, J., Hora, J., Boček P., Somol, P., Pudil, P.: Statistical model of the 2001 Czech census for interactive presentation. J. Official Stat. 26(4), 673–694 (2010). http://ro.utia.cas.cz/dem.html

  19. Grim, J., Kittler, J., Pudil, P., Somol, P.: Multiple classifier fusion in probabilistic neural networks. Pattern Anal. Appl. 5(7), 221–233 (2002)

    Article  MathSciNet  Google Scholar 

  20. Grim, J., Pudil, P.: Pattern recognition by probabilistic neural networks—mixtures of product components versus mixtures of dependence trees. In: Proceedings of the International Conference on Neural Computation Theory and Applications NCTA2014. Rome, SCITEPRESS, 2014, s. 65–75 (2014)

    Google Scholar 

  21. Grim, J., Pudil, P., Somol, P.: Recognition of handwritten numerals by structural probabilistic neural networks. In: Bothe, H., Rojas, R. (eds.) Proceedings of the Second ICSC Symposium on Neural Computation, pp. 528–534. Berlin ICSC, Wetaskiwin (2000)

    Google Scholar 

  22. Grim, J., Pudil, P., Somol, P.: Boosting in probabilistic neural networks. In: Kasturi, R., Laurendeau, D., Suen, C. (eds.) Proceedings of the 16th International Conference on Pattern Recognition, pp. 136–139. IEEE Computer Society, Los Alamitos (2002b)

    Google Scholar 

  23. Grim, J., Somol, P., Haindl, M., Daneš, J.: Computer-aided evaluation of screening mammograms based on local texture models. IEEE Trans. Image Process. 18(4), 765–773 (2009)

    Article  MathSciNet  Google Scholar 

  24. Hasselblad, V.: Estimation of prameters for a mixture of normal distributions. Technometrics 8, 431–444 (1966)

    Article  MathSciNet  Google Scholar 

  25. Hasselblad, V.: Estimation of finite mixtures of distributions from the exponential family. J. Am. Stat. Assoc. 58, 1459–1471 (1969)

    Article  Google Scholar 

  26. Hosmer Jr, D.W.: A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 761–770 (1973)

    Article  Google Scholar 

  27. Jarník, V.: About a certain minimal problem. Trans. Moravian Soc. Nat. Sci. (in Czech) No. 6, 57–63 (1930)

    Google Scholar 

  28. Kruskal, J.B.: On the shortest spanning sub-tree of a graph. Proc. Am. Math. Soc. 7, 48–50 (1956)

    Article  Google Scholar 

  29. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  30. Lowd, D., Domingos, P.: Naive Bayes models for probability estimation. In: Proceedings of the 22nd International Conference on Machine Learning, ACM 2005, pp. 529–536 (2005)

    Google Scholar 

  31. Markley, S.C., Miller, D.J.: Joint parsimonious modeling and model order selection for multivariate Gaussian mixtures. IEEE J. Sel. Top. Sign. Process. 4(3), 548–559 (2010)

    Article  Google Scholar 

  32. Meila, M., Jordan, M.I.: Estimating dependency structure as a hidden variable. In: Proceedings of the 1997 Conference on Avances in Neural Information Processing Systems, vol. 10, pp. 584–590 (1998)

    Google Scholar 

  33. Meila, M., Jaakkola T.: Tractable bayesian learning of tree belief networks. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 380–388 (2000)

    Google Scholar 

  34. Meila, M., Jordan, M.I.: Learning with mixtures of trees. J. Mach. Learn. Res. 1(9), 1–48 (2001)

    MathSciNet  Google Scholar 

  35. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 1389–1401 (1957)

    Article  Google Scholar 

  36. Schlesinger, M.I.: Relation between learning and self learning in pattern recognition (in Russian). Kibernetika (Kiev) No. 2, 81–88 (1968)

    Google Scholar 

  37. Vajda, I.: Theory of Statistical Inference and Information. Kluwer Academic Publishers, Dordrecht and Boston (1989)

    Google Scholar 

  38. Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Czech Science Foundation Projects No. 14-02652S and P403/12/1557.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiří Grim .

Editor information

Editors and Affiliations

Appendix: Maximum-Weight Spanning Tree

Appendix: Maximum-Weight Spanning Tree

The algorithm of Kruskal (cf. [3, 28]) assumes ordering of all \(N(N-1)/2\) edge weights in descending order. The maximum-weight spanning tree is then constructed sequentially, starting with the first two (heaviest) edges. The next edges are added sequentially in descending order if they do not form a cycle with the previously chosen edges. Multiple solutions are possible if several edge weights are equal, but they are ignored as having the same maximum weight. Obviously, in case of dependence-tree mixtures with many components, the application of the Kruskal algorithm may become prohibitive in high-dimensional spaces because of the repeated ordering of the edge-weights.

The algorithm of Prim [35] does not need any ordering of edge weights. Starting from any variable we choose the neighbor with the maximum edge weight. This first edge of the maximum-weight spanning tree is then sequentially extended by adding the maximum-weight neighbors of the currently chosen subtree. Again, any ties may be decided arbitrarily since we are not interested in multiple solutions.

Both Kruskal and Prim refer to an “obscure Czech paper” of Otakar Borůvka [1] from the year 1926 giving an alternative construction of the minimum-weight spanning tree and the corresponding proof of uniqueness. Moreover, the Prim’s algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník (cf. [27], in Czech). The algorithm of Prim can be summarized as follows (in C-code):

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Grim, J., Pudil, P. (2016). Mixtures of Product Components Versus Mixtures of Dependence Trees. In: Merelo, J.J., Rosa, A., Cadenas, J.M., Dourado, A., Madani, K., Filipe, J. (eds) Computational Intelligence. IJCCI 2014. Studies in Computational Intelligence, vol 620. Springer, Cham. https://doi.org/10.1007/978-3-319-26393-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26393-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26391-5

  • Online ISBN: 978-3-319-26393-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics