Mixtures of Product Components Versus Mixtures of Dependence Trees

Grim, Jiří; Pudil, Pavel

doi:10.1007/978-3-319-26393-9_22

Jiří Grim⁸ &
Pavel Pudil⁹

Part of the book series: Studies in Computational Intelligence ((SCI,volume 620))

Included in the following conference series:

International Joint Conference on Computational Intelligence

520 Accesses

Abstract

Mixtures of product components assume independence of variables given the index of the component. They can be efficiently estimated from data by means of EM algorithm and have some other useful properties. On the other hand, by considering mixtures of dependence trees, we can explicitly describe the statistical relationship between pairs of variables at the level of individual components and therefore approximation power of the resulting mixture may essentially increase. However, we have found in application to classification of numerals that both models perform comparably and the contribution of dependence-tree structures to the log-likelihood criterion decreases in the course of EM iterations. Thus the optimal estimate of dependence-tree mixture tends to reduce to a simple product mixture model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Borůvka, O.: On a minimal problem. Trans. Moravian Soc. Nat. Sci. (in Czech) No. 3 (1926)
Google Scholar
Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533–1543 (2004)
Article Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory IT-14(3), 462–467 (1968)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, l–38 (1977)
MathSciNet Google Scholar
Day, N.E.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)
Article MathSciNet Google Scholar
Grim, J.: On numerical evaluation of maximum—likelihood estimates for finite mixtures of distributions. Kybernetika l8(3), 173–190 (1982). http://dml.cz/dmlcz/124132
Grim, J.: On structural approximating multivariate discrete probability distributions. Kybernetika 20(1), 1–17 (1984). http://dml.cz/dmlcz/125676
Grim, J.: Multivariate statistical pattern recognition with nonreduced dimensionality. Kybernetika 22(2), 142–157 (1986). http://dml.cz/dmlcz/125022
Grim, J.: Design of multilayer neural networks by information preserving transforms. In: Pessa, E., Penna, M.P., Montesanto A. (eds.) Third European Congress on Systems Science. Edizioni Kappa, Roma, pp. 977–982 (1996)
Google Scholar
Grim, J.: Information approach to structural optimization of probabilistic neural networks. In: Ferrer, L. et al. (eds.) Proceedings of the 4th System Science European Congress. Valencia, Soc. Espanola de Sistemas Generales, pp. 527–540 (1999)
Google Scholar
Grim, J.: A sequential modification of EM algorithm. In: Gaul, W., Locarek-Junge, H. (eds.) Studies in Classification, Data Analysis and Knowledge Organization, pp. 163–170. Springer (1999)
Google Scholar
Grim, J.: EM cluster analysis for categorical data. In: Yeung, D.Y., Kwok, J.T., Fred, A. (eds.) Structural, Syntactic and Statistical Pattern Recognition, LNCS 4109, pp. 640–648. Springer, Berlin (2006)
Google Scholar
Grim, J.: Neuromorphic features of probabilistic neural networks. Kybernetika 43(5), 697–712 (2007). http://dml.cz/dmlcz/135807
Grim, J.: Sequential pattern recognition by maximum conditional informativity. Pattern Recogn. Lett. 45C, 39–45 (2014). doi:10.1016/j.patrec.2014.02.024
Article Google Scholar
Grim, J.: Approximating probability densities by mixtures of gaussian dependence trees. In: Hobza, T. (ed.) Proceedings of the SPMS 2014 Stochastic and Physical Monitoring Systems, pp. 43–56. Czech Technical University Prague (2014)
Google Scholar
Grim, J., Hora, J.: Iterative principles of recognition in probabilistic neural networks. Neural Netw. 21(6), 838–846 (2008)
Article Google Scholar
Grim, J., Hora, J.: Computational properties of probabilistic neural networks. In: Artificial Neural Networks—ICANN 2010 Part II, LNCS 5164, pp. 52–61. Springer, Berlin (2010)
Chapter Google Scholar
Grim, J., Hora, J., Boček P., Somol, P., Pudil, P.: Statistical model of the 2001 Czech census for interactive presentation. J. Official Stat. 26(4), 673–694 (2010). http://ro.utia.cas.cz/dem.html
Grim, J., Kittler, J., Pudil, P., Somol, P.: Multiple classifier fusion in probabilistic neural networks. Pattern Anal. Appl. 5(7), 221–233 (2002)
Article MathSciNet Google Scholar
Grim, J., Pudil, P.: Pattern recognition by probabilistic neural networks—mixtures of product components versus mixtures of dependence trees. In: Proceedings of the International Conference on Neural Computation Theory and Applications NCTA2014. Rome, SCITEPRESS, 2014, s. 65–75 (2014)
Google Scholar
Grim, J., Pudil, P., Somol, P.: Recognition of handwritten numerals by structural probabilistic neural networks. In: Bothe, H., Rojas, R. (eds.) Proceedings of the Second ICSC Symposium on Neural Computation, pp. 528–534. Berlin ICSC, Wetaskiwin (2000)
Google Scholar
Grim, J., Pudil, P., Somol, P.: Boosting in probabilistic neural networks. In: Kasturi, R., Laurendeau, D., Suen, C. (eds.) Proceedings of the 16th International Conference on Pattern Recognition, pp. 136–139. IEEE Computer Society, Los Alamitos (2002b)
Google Scholar
Grim, J., Somol, P., Haindl, M., Daneš, J.: Computer-aided evaluation of screening mammograms based on local texture models. IEEE Trans. Image Process. 18(4), 765–773 (2009)
Article MathSciNet Google Scholar
Hasselblad, V.: Estimation of prameters for a mixture of normal distributions. Technometrics 8, 431–444 (1966)
Article MathSciNet Google Scholar
Hasselblad, V.: Estimation of finite mixtures of distributions from the exponential family. J. Am. Stat. Assoc. 58, 1459–1471 (1969)
Article Google Scholar
Hosmer Jr, D.W.: A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 761–770 (1973)
Article Google Scholar
Jarník, V.: About a certain minimal problem. Trans. Moravian Soc. Nat. Sci. (in Czech) No. 6, 57–63 (1930)
Google Scholar
Kruskal, J.B.: On the shortest spanning sub-tree of a graph. Proc. Am. Math. Soc. 7, 48–50 (1956)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Lowd, D., Domingos, P.: Naive Bayes models for probability estimation. In: Proceedings of the 22nd International Conference on Machine Learning, ACM 2005, pp. 529–536 (2005)
Google Scholar
Markley, S.C., Miller, D.J.: Joint parsimonious modeling and model order selection for multivariate Gaussian mixtures. IEEE J. Sel. Top. Sign. Process. 4(3), 548–559 (2010)
Article Google Scholar
Meila, M., Jordan, M.I.: Estimating dependency structure as a hidden variable. In: Proceedings of the 1997 Conference on Avances in Neural Information Processing Systems, vol. 10, pp. 584–590 (1998)
Google Scholar
Meila, M., Jaakkola T.: Tractable bayesian learning of tree belief networks. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 380–388 (2000)
Google Scholar
Meila, M., Jordan, M.I.: Learning with mixtures of trees. J. Mach. Learn. Res. 1(9), 1–48 (2001)
MathSciNet Google Scholar
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 1389–1401 (1957)
Article Google Scholar
Schlesinger, M.I.: Relation between learning and self learning in pattern recognition (in Russian). Kibernetika (Kiev) No. 2, 81–88 (1968)
Google Scholar
Vajda, I.: Theory of Statistical Inference and Information. Kluwer Academic Publishers, Dordrecht and Boston (1989)
Google Scholar
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Czech Science Foundation Projects No. 14-02652S and P403/12/1557.

Author information

Authors and Affiliations

Institute of Information Theory and Automation, Czech Academy of Sciences, Prague, Czech Republic
Jiří Grim
Faculty of Management, Prague University of Economics Jindřichův Hradec, Prague, Czech Republic
Pavel Pudil

Authors

Jiří Grim
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Pudil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiří Grim .

Editor information

Editors and Affiliations

Ingenería Informática, Escuela Técnica Superior de, Granada, Spain
Juan Julian Merelo
aSEEB-ISR-IST, Technical University of Lisbon (IST), Lisbon, Portugal
Agostinho Rosa
Facultad de Informática, University of Murcia, Murcia, Spain
José M. Cadenas
University of Coimbra, Coimbra, Portugal
António Dourado
Images, Signals and Intelligence, University PARIS-EST Créteil (UPEC), Créteil, France
Kurosh Madani
Instituto Politécnico de Setúbal (IPS), Setúbal, Portugal
Joaquim Filipe

Appendix: Maximum-Weight Spanning Tree

The algorithm of Kruskal (cf. [3, 28]) assumes ordering of all \(N(N-1)/2\) edge weights in descending order. The maximum-weight spanning tree is then constructed sequentially, starting with the first two (heaviest) edges. The next edges are added sequentially in descending order if they do not form a cycle with the previously chosen edges. Multiple solutions are possible if several edge weights are equal, but they are ignored as having the same maximum weight. Obviously, in case of dependence-tree mixtures with many components, the application of the Kruskal algorithm may become prohibitive in high-dimensional spaces because of the repeated ordering of the edge-weights.

The algorithm of Prim [35] does not need any ordering of edge weights. Starting from any variable we choose the neighbor with the maximum edge weight. This first edge of the maximum-weight spanning tree is then sequentially extended by adding the maximum-weight neighbors of the currently chosen subtree. Again, any ties may be decided arbitrarily since we are not interested in multiple solutions.

Both Kruskal and Prim refer to an “obscure Czech paper” of Otakar Borůvka [1] from the year 1926 giving an alternative construction of the minimum-weight spanning tree and the corresponding proof of uniqueness. Moreover, the Prim’s algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník (cf. [27], in Czech). The algorithm of Prim can be summarized as follows (in C-code):

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grim, J., Pudil, P. (2016). Mixtures of Product Components Versus Mixtures of Dependence Trees. In: Merelo, J.J., Rosa, A., Cadenas, J.M., Dourado, A., Madani, K., Filipe, J. (eds) Computational Intelligence. IJCCI 2014. Studies in Computational Intelligence, vol 620. Springer, Cham. https://doi.org/10.1007/978-3-319-26393-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-26393-9_22
Published: 25 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26391-5
Online ISBN: 978-3-319-26393-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Mixtures of Product Components Versus Mixtures of Dependence Trees

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Maximum-Weight Spanning Tree

Appendix: Maximum-Weight Spanning Tree

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation