tutorial

Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks

Authors:

Charles H. Martin,

Michael W. MahoneyAuthors Info & Claims

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 3239 - 3240

https://doi.org/10.1145/3292500.3332294

Published: 25 July 2019 Publication History

Abstract

There have long been connections between statistical mechanics and neural networks, but in recent decades these connections have withered. However, in light of recent failings of statistical learning theory and stochastic optimization theory to describe, even qualitatively, many properties of production-quality neural network models, researchers have revisited ideas from the statistical mechanics of neural networks. This tutorial will provide an overview of the area; it will go into detail on how connections with random matrix theory and heavy-tailed random matrix theory can lead to a practical phenomenological theory for large-scale deep neural networks; and it will describe future directions.

Supplementary Material

MP4 File (p3239-martin.mp4)

Download
5488.38 MB

References

[1]

M. Advani and S. Ganguli. 2016. Statistical Mechanics of High-Dimensional Inference. Technical Report Preprint: arXiv:1601.04650.

[2]

G. Ben Arous and A. Guionnet. 2008. The spectrum of heavy tailed random matrices. Communications in Mathematical Physics, Vol. 278, 3 (2008), 715--751.

[3]

A. Auffinger and S. Tang. 2016. Extreme eigenvalues of sparse, heavy tailed random matrices. Stochastic Processes and their Applications, Vol. 126, 11 (2016), 3310--3330.

[4]

C. Baldassi, C. Borgs, J. T. Chayes, A. Ingrosso, C. Lucibello, L. Saglietti, and R. Zecchina. 2016. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc. Natl. Acad. Sci. USA, Vol. 113, 48 (2016), E7655--E7662.

[5]

A. J. Ballard, R. Das, S. Martiniani, D. Mehta, L. Sagun, J. D. Stevenson, and D. J. Wales. 2017. Energy landscapes for machine learning. Physical Chemistry Chemical Physics, Vol. 19 (2017), 12585--12603. Issue 20.

[6]

J. P. Bouchaud and M. Potters. 2003. Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management .Cambridge University Press.

[7]

J. D. Cowan. 1967. Statistical Mechanics of Neural Networks. Ft. Belvoir: Defense Technical Information Center.

[8]

Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Annual Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference. 2933--2941.

Digital Library

[9]

A. Engel and C. P. L. Van den Broeck. 2001. Statistical mechanics of learning .Cambridge University Press, New York, NY, USA.

Digital Library

[10]

N. Golmant, N. Vemuri, Z. Yao, V. Feinberg, A. Gholami, K. Rothauge, M. W. Mahoney, and J. Gonzalez. 2018. On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent. Technical Report. Preprint: arXiv:1811.12941.

[11]

D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. 1996. Rigorous Learning Curve Bounds from Statistical Mechanics. Machine Learning, Vol. 25, 2 (1996), 195--236.

Digital Library

[12]

I. M. Johnstone. 2001. On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics (2001), 295--327.

[13]

N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. 2016. On large-batch training for deep learning: generalization gap and sharp minima. Technical Report Preprint: arXiv:1609.04836.

[14]

Q. Liao, B. Miranda, A. Banburski, J. Hidary, and T. Poggio. 2018. A surprising linear relationship predicts test performance in deep networks. Technical Report Preprint: arXiv:1807.09659.

[15]

W. A. Little. 1974. The existence of persistent states in the brain. Math. Biosci., Vol. 19 (1974), 101--120.

[16]

M. W. Mahoney. February 2019. Seminar at ACM SF-SIG. https://www.youtube.com/watch?v=2qF8TezRwS0.

[17]

M. W. Mahoney. September 2018. Seminar at Simons Institute. https://simons.berkeley.edu/talks/9--24-mahoney-deep-learning.

[18]

C. H. Martin. December 2018a. Seminar at ICSI. https://www.youtube.com/watch?v=6Zgul4oygMc.

[19]

C. H. Martin. June 2018b. Seminar at LBNL. https://www.youtube.com/watch?v=_Ni5UDrVwYU.

[20]

C. H. Martin and M. W. Mahoney. 2017. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. Technical Report Preprint: arXiv:1710.09553.

[21]

C. H. Martin and M. W. Mahoney. 2018. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning. Technical Report Preprint: arXiv:1810.01075.

[22]

C. H. Martin and M. W. Mahoney. 2019a. Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks. Technical Report Preprint: arXiv:1901.08278.

[23]

C. H. Martin and M. W. Mahoney. 2019b. Traditional and Heavy-Tailed Self Regularization in Neural Network Models. In Proceedings of the 36st International Conference on Machine Learning.

[24]

J. Pennington and Y. Bahri. 2017. Geometry of Neural Network Loss Surfaces via Random Matrix Theory. In Proceedings of the 34th International Conference on Machine Learning. 2798--2806.

Digital Library

[25]

J. Pennington, S. S. Schoenholz, and S. Ganguli. 2018. The Emergence of Spectral Universality in Deep Networks. Technical Report Preprint: arXiv:1802.09979.

[26]

T. Poggio, Q. Liao, B. Miranda, A. Banburski, X. Boix, and J. Hidary. 2018. Theory IIIb: Generalization in Deep Networks. Technical Report Preprint: arXiv:1806.11379.

[27]

H. S. Seung, H. Sompolinsky, and N. Tishby. 1992. Statistical mechanics of learning from examples. Physical Review A, Vol. 45, 8 (1992), 6056--6091.

[28]

C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl. 2018. Measuring the Effects of Data Parallelism on Neural Network Training. Technical Report. Preprint: arXiv:1811.03600.

[29]

H. Sompolinsky. 1988. Statistical Mechanics of Neural Networks. Physics Today, Vol. 41, 12 (1988), 70--80.

[30]

D. Sornette. 2006. Critical phenomena in natural sciences: chaos, fractals, selforganization and disorder: concepts and tools .Springer-Verlag, Berlin.

[31]

T. L. H. Watkin, A. Rau, and M. Biehl. 1993. The statistical mechanics of learning a rule. Rev. Mod. Phys., Vol. 65, 2 (1993), 499--556.

[32]

Z. Yao, A. Gholami, Q. Lei, K. Keutzer, and M. W. Mahoney. 2018. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries. Technical Report. Preprint: arXiv:1802.08241.

Digital Library

[33]

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. 2016. Understanding deep learning requires rethinking generalization. Technical Report Preprint: arXiv:1611.03530.

Cited By

Zhang BLiao R(2021)Selecting the Best Routing Traffic for Packets in LAN via Machine Learning to Achieve the Best StrategyComplexity10.1155/2021/55728812021:1Online publication date: 15-Apr-2021
https://doi.org/10.1155/2021/5572881
Martin CPeng TMahoney M(2021)Predicting trends in the quality of state-of-the-art neural networks without access to training or testing dataNature Communications10.1038/s41467-021-24025-812:1Online publication date: 5-Jul-2021
https://doi.org/10.1038/s41467-021-24025-8
Liao ZCouillet RMahoney MLarochelle HRanzato MHadsell RBalcan MLin H(2020)A random matrix analysis of random Fourier featuresProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496893(13939-13950)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496893

Index Terms

Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks
1. Computing methodologies
  1. Machine learning

Recommendations

Recurrent Neural Networks for Computing Pseudoinverses of Rank-Deficient Matrices

Three recurrent neural networks are presented for computing the pseudoinverses of rank-deficient matrices. The first recurrent neural network has the dynamical equation similar to the one proposed earlier for matrix inversion and is capable of Moore--...
Artificial neural networks: learning algorithms, performance evaluation, and applications
Efficient and reliable training of neural networks
HSI'09: Proceedings of the 2nd conference on Human System Interactions

This paper introduces a neural network training tool, NBN 2.0, which is developed based on neuron by neuron computing method [1][2]. Error backpropagation (EBP) algorithm, Levenberg Marquardt (LM) algorithm and its improved versions are implemented in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2019

3305 pages

ISBN:9781450362016

DOI:10.1145/3292500

General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota

Copyright © 2019 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

KDD '19

Sponsor:

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 4 - 8, 2019

AK, Anchorage, USA

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang BLiao R(2021)Selecting the Best Routing Traffic for Packets in LAN via Machine Learning to Achieve the Best StrategyComplexity10.1155/2021/55728812021:1Online publication date: 15-Apr-2021
https://doi.org/10.1155/2021/5572881
Martin CPeng TMahoney M(2021)Predicting trends in the quality of state-of-the-art neural networks without access to training or testing dataNature Communications10.1038/s41467-021-24025-812:1Online publication date: 5-Jul-2021
https://doi.org/10.1038/s41467-021-24025-8
Liao ZCouillet RMahoney MLarochelle HRanzato MHadsell RBalcan MLin H(2020)A random matrix analysis of random Fourier featuresProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496893(13939-13950)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496893

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten