Hyper-Parameter-Free Generative Modelling with Deep Boltzmann Trees

Piatkowski, Nico

doi:10.1007/978-3-030-46147-8_25

Nico Piatkowski¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1381 Accesses
3 Citations

Abstract

Deep neural networks achieve state-of-the-art results in various classification and synthetic data generation tasks. However, only little is known about why depth improves a model. We investigate the structure of stochastic deep neural works, also known as Deep Boltzmann Machines, to shed some light on this issue. While the best known results postulate an exponential dependence between the number of visible units and the depth of the model, we show that the required depth is upper bounded by the longest path in the underlying junction tree, which is at most linear in the number of visible units. Moreover, we show that the conditional independence structure of any categorical Deep Boltzmann Machine contains a sub-tree that allows the consistent estimation of the full joint probability mass function of all visible units. We connect our results to \(l_1\)-regularized maximum-likelihood estimation and Chow-Liu trees. Based on our theoretical findings, we present a new tractable version of Deep Boltzmann Machines, namely the Deep Boltzmann Tree (DBT). We provide a hyper-parameter-free algorithm for learning the DBT from data, and propose a new initialization method to enforce convergence to good solutions. Our findings provide some theoretical evidence for why a deep model might be beneficial. Experimental results on benchmark data show, that the DBT is a theoretical sound alternative to likelihood-free generative models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It turns out that DBTs consist of exactly two hidden layers. While this kind of depth seems rather “shallow”, original work on DBMs [20] define the DBM as a restricted Boltzmann machine which has more than one hidden layer. Thus, to be consistent with the common terminology, we decided to denote our proposed model as “deep”.
2.
We have to stress that this is not a hyper-parameter of the DBT. Moreover, 0.05 is not “tuned” either as it is the default value in Chordalysis.

References

Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems (NIPS), pp. 2654–2662 (2014)
Google Scholar
Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems (NIPS) 30, pp. 6241–6250 (2017)
Google Scholar
Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 535–541 (2006)
Google Scholar
Bulatov, A., Grohe, M.: The complexity of partition functions. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 294–306. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27836-8_27
Chapter Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)
Article MathSciNet MATH Google Scholar
Dempster, A.P., Laird, M.N., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Fulkerson, D.R., Gross, O.A.: Incidence matrices and interval graphs. Pac. J. Math. 15(3), 835–855 (1965)
Article MathSciNet MATH Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 249–256 (2010)
Google Scholar
Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complexity of neural networks. In: Conference On Learning Theory (COLT), pp. 297–299 (2018)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)
Google Scholar
Hammersley, J.M., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished manuscript (1971)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Li, Y., Turner, R.E.: Gradient estimators for implicit models. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Montavon, G., Braun, M.L., Müller, K.: Deep Boltzmann machines as feed-forward hierarchies. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 798–804 (2012)
Google Scholar
Montúfar, G.: Deep narrow Boltzmann machines are universal approximators. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Montúfar, G., Morton, J.: Discrete restricted Boltzmann machines. J. Mach. Learn. Res. 16, 653–672 (2015)
MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/\(k^2\)). Sov. Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Conference on Learning Theory (COLT), pp. 1376–1401 (2015)
Google Scholar
Papandreou, G., Yuille, A.L.: Perturb-and-MAP random fields: using discrete optimization to learn and sample from energy models. In: IEEE International Conference on Computer Vision (ICCV), pp. 193–200 (2011)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 448–455 (2009)
Google Scholar
Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: Artificial Intelligence and Statistics (AISTATS), pp. 693–700 (2010)
Google Scholar
Shazeer, N., et al.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. CoRR abs/1701.06538 (2017)
Google Scholar
Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)
Article MathSciNet MATH Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
Article MATH Google Scholar
Warde-Farley, D., Bengio, Y.: Improving generative adversarial networks with denoising feature matching. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Webb, G.I., Petitjean, F.: A multiple test correction for streams and cascades of statistical hypothesis tests. In: International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1225–1264 (2016)
Google Scholar
Yang, E., Lozano, A.C., Ravikumar, P.: Elementary estimators for graphical models. In: Advances in Neural Information Processing Systems (NIPS) 27, pp. 2159–2167 (2014)
Google Scholar

Download references

Acknowledgments

This research has been funded by the Federal Ministry of Education and Research of Germany as part of the competence center for machine learning ML2R (01S18038A).

Author information

Authors and Affiliations

AI Group, TU Dortmund, Dortmund, Germany
Nico Piatkowski

Authors

Nico Piatkowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nico Piatkowski .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 167 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Piatkowski, N. (2020). Hyper-Parameter-Free Generative Modelling with Deep Boltzmann Trees. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-46147-8_25
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46146-1
Online ISBN: 978-3-030-46147-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)