skip to main content
10.1145/3292500.3332294acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
tutorial

Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks

Published: 25 July 2019 Publication History

Abstract

There have long been connections between statistical mechanics and neural networks, but in recent decades these connections have withered. However, in light of recent failings of statistical learning theory and stochastic optimization theory to describe, even qualitatively, many properties of production-quality neural network models, researchers have revisited ideas from the statistical mechanics of neural networks. This tutorial will provide an overview of the area; it will go into detail on how connections with random matrix theory and heavy-tailed random matrix theory can lead to a practical phenomenological theory for large-scale deep neural networks; and it will describe future directions.

Supplementary Material

MP4 File (p3239-martin.mp4)

References

[1]
M. Advani and S. Ganguli. 2016. Statistical Mechanics of High-Dimensional Inference. Technical Report Preprint: arXiv:1601.04650.
[2]
G. Ben Arous and A. Guionnet. 2008. The spectrum of heavy tailed random matrices. Communications in Mathematical Physics, Vol. 278, 3 (2008), 715--751.
[3]
A. Auffinger and S. Tang. 2016. Extreme eigenvalues of sparse, heavy tailed random matrices. Stochastic Processes and their Applications, Vol. 126, 11 (2016), 3310--3330.
[4]
C. Baldassi, C. Borgs, J. T. Chayes, A. Ingrosso, C. Lucibello, L. Saglietti, and R. Zecchina. 2016. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc. Natl. Acad. Sci. USA, Vol. 113, 48 (2016), E7655--E7662.
[5]
A. J. Ballard, R. Das, S. Martiniani, D. Mehta, L. Sagun, J. D. Stevenson, and D. J. Wales. 2017. Energy landscapes for machine learning. Physical Chemistry Chemical Physics, Vol. 19 (2017), 12585--12603. Issue 20.
[6]
J. P. Bouchaud and M. Potters. 2003. Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management .Cambridge University Press.
[7]
J. D. Cowan. 1967. Statistical Mechanics of Neural Networks. Ft. Belvoir: Defense Technical Information Center.
[8]
Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Annual Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference. 2933--2941.
[9]
A. Engel and C. P. L. Van den Broeck. 2001. Statistical mechanics of learning .Cambridge University Press, New York, NY, USA.
[10]
N. Golmant, N. Vemuri, Z. Yao, V. Feinberg, A. Gholami, K. Rothauge, M. W. Mahoney, and J. Gonzalez. 2018. On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent. Technical Report. Preprint: arXiv:1811.12941.
[11]
D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. 1996. Rigorous Learning Curve Bounds from Statistical Mechanics. Machine Learning, Vol. 25, 2 (1996), 195--236.
[12]
I. M. Johnstone. 2001. On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics (2001), 295--327.
[13]
N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. 2016. On large-batch training for deep learning: generalization gap and sharp minima. Technical Report Preprint: arXiv:1609.04836.
[14]
Q. Liao, B. Miranda, A. Banburski, J. Hidary, and T. Poggio. 2018. A surprising linear relationship predicts test performance in deep networks. Technical Report Preprint: arXiv:1807.09659.
[15]
W. A. Little. 1974. The existence of persistent states in the brain. Math. Biosci., Vol. 19 (1974), 101--120.
[16]
M. W. Mahoney. February 2019. Seminar at ACM SF-SIG. https://www.youtube.com/watch?v=2qF8TezRwS0.
[17]
M. W. Mahoney. September 2018. Seminar at Simons Institute. https://simons.berkeley.edu/talks/9--24-mahoney-deep-learning.
[18]
C. H. Martin. December 2018a. Seminar at ICSI. https://www.youtube.com/watch?v=6Zgul4oygMc.
[19]
C. H. Martin. June 2018b. Seminar at LBNL. https://www.youtube.com/watch?v=_Ni5UDrVwYU.
[20]
C. H. Martin and M. W. Mahoney. 2017. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. Technical Report Preprint: arXiv:1710.09553.
[21]
C. H. Martin and M. W. Mahoney. 2018. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning. Technical Report Preprint: arXiv:1810.01075.
[22]
C. H. Martin and M. W. Mahoney. 2019a. Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks. Technical Report Preprint: arXiv:1901.08278.
[23]
C. H. Martin and M. W. Mahoney. 2019b. Traditional and Heavy-Tailed Self Regularization in Neural Network Models. In Proceedings of the 36st International Conference on Machine Learning.
[24]
J. Pennington and Y. Bahri. 2017. Geometry of Neural Network Loss Surfaces via Random Matrix Theory. In Proceedings of the 34th International Conference on Machine Learning. 2798--2806.
[25]
J. Pennington, S. S. Schoenholz, and S. Ganguli. 2018. The Emergence of Spectral Universality in Deep Networks. Technical Report Preprint: arXiv:1802.09979.
[26]
T. Poggio, Q. Liao, B. Miranda, A. Banburski, X. Boix, and J. Hidary. 2018. Theory IIIb: Generalization in Deep Networks. Technical Report Preprint: arXiv:1806.11379.
[27]
H. S. Seung, H. Sompolinsky, and N. Tishby. 1992. Statistical mechanics of learning from examples. Physical Review A, Vol. 45, 8 (1992), 6056--6091.
[28]
C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl. 2018. Measuring the Effects of Data Parallelism on Neural Network Training. Technical Report. Preprint: arXiv:1811.03600.
[29]
H. Sompolinsky. 1988. Statistical Mechanics of Neural Networks. Physics Today, Vol. 41, 12 (1988), 70--80.
[30]
D. Sornette. 2006. Critical phenomena in natural sciences: chaos, fractals, selforganization and disorder: concepts and tools .Springer-Verlag, Berlin.
[31]
T. L. H. Watkin, A. Rau, and M. Biehl. 1993. The statistical mechanics of learning a rule. Rev. Mod. Phys., Vol. 65, 2 (1993), 499--556.
[32]
Z. Yao, A. Gholami, Q. Lei, K. Keutzer, and M. W. Mahoney. 2018. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries. Technical Report. Preprint: arXiv:1802.08241.
[33]
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. 2016. Understanding deep learning requires rethinking generalization. Technical Report Preprint: arXiv:1611.03530.

Cited By

View all
  • (2021)Selecting the Best Routing Traffic for Packets in LAN via Machine Learning to Achieve the Best StrategyComplexity10.1155/2021/55728812021:1Online publication date: 15-Apr-2021
  • (2021)Predicting trends in the quality of state-of-the-art neural networks without access to training or testing dataNature Communications10.1038/s41467-021-24025-812:1Online publication date: 5-Jul-2021
  • (2020)A random matrix analysis of random Fourier featuresProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496893(13939-13950)Online publication date: 6-Dec-2020

Index Terms

  1. Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Check for updates

    Author Tags

    1. heavy-tailed random matrix theory
    2. neural networks
    3. random matrix theory
    4. statistical mechanics

    Qualifiers

    • Tutorial

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Selecting the Best Routing Traffic for Packets in LAN via Machine Learning to Achieve the Best StrategyComplexity10.1155/2021/55728812021:1Online publication date: 15-Apr-2021
    • (2021)Predicting trends in the quality of state-of-the-art neural networks without access to training or testing dataNature Communications10.1038/s41467-021-24025-812:1Online publication date: 5-Jul-2021
    • (2020)A random matrix analysis of random Fourier featuresProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496893(13939-13950)Online publication date: 6-Dec-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media