Neural Networks as Model Selection with Incremental MDL Normalization

Lin, Baihan

doi:10.1007/978-981-15-1398-5_14

Neural Networks as Model Selection with Incremental MDL Normalization

Baihan Lin^12,13,14

Conference paper
First Online: 10 November 2019

626 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1072))

Abstract

If we consider the neural network optimization process as a model selection problem, the implicit space can be constrained by the normalizing factor, the minimum description length of the optimal universal code. Inspired by the adaptation phenomenon of biological neuronal firing, we propose a class of reparameterization of the activation in the neural network that take into account the statistical regularity in the implicit space under the Minimum Description Length (MDL) principle. We introduce an incremental version of computing this universal code as normalized maximum likelihood and demonstrated its flexibility to include data prior such as top-down attention and other oracle information and its compatibility to be incorporated into batch normalization and layer normalization. The empirical results showed that the proposed method outperforms existing normalization methods in tackling the limited and imbalanced data from a non-stationary distribution benchmarked on computer vision and reinforcement learning tasks. As an unsupervised attention mechanism given input data, this biologically plausible normalization has the potential to deal with other complicated real-world scenarios as well as reinforcement learning setting where the rewards are sparse and non-uniform. Further research is proposed to discover these scenarios and explore the behaviors among different variants.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In continuous data streams or time series analysis, the incrementation step can be replaced by integrating over the seen territory of the probability distribution X of the data.
2.
The raw data and code to reproduce the results can be downloaded at https://app.box.com/s/ruycgz8p7rh30taj38d8dkc0h1ptltg1.

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Blakemore, C., Campbell, F.W.: Adaptation to spatial stimuli. J. physiol. 200(1), 11P–13P (1969)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Ding, S., Cueva, C.J., Tsodyks, M., Qian, N.: Visual perception as retrospective Bayesian decoding from high-to low-level features. Proc. Nat. Acad. Sci. 114(43), E9115–E9124 (2017)
Article Google Scholar
Dragoi, V., Sharma, J., Sur, M.: Adaptation-induced plasticity of orientation tuning in adult visual cortex. Neuron 28(1), 287–298 (2000)
Article Google Scholar
Grünwald, P.D.: The Minimum Description Length Principle. MIT press, Cambridge (2007)
Book Google Scholar
Hinton, G., Van Camp, D.: Keeping neural networks simple by minimizing the description length of the weights. In: In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory. Citeseer (1993)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998)
Lin, B., Bouneffouf, D., Cecchi, G.A., Rish, I.: Contextual bandit with adaptive feature extraction. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 937–944. IEEE (2018)
Google Scholar
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Myung, J.I., Navarro, D.J., Pitt, M.A.: Model selection by normalized maximum likelihood. J. Math. Psychol. 50(2), 167–179 (2006)
Article MathSciNet Google Scholar
Qian, N., Zhang, J.: Neuronal firing rate as code length: a hypothesis. Comput. Behav. pp. 1–20 (2019)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article Google Scholar
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore (1989)
MATH Google Scholar
Rissanen, J.: Strong optimality of the normalized ML models as universal codes and information in data. IEEE Trans. Inf. Theory 47(5), 1712–1717 (2001)
Article MathSciNet Google Scholar
Salimans, T., Kingma, D.P.: Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)
Google Scholar
Zemel, R.S., Hinton, G.E.: Learning population coes by minimizing description length. In: Unsupervised Learning, pp. 261–276. Bradford Company (1999)
Google Scholar
Zhang, J.: Model selection with informative normalized maximum likelihood: Data prior and model prior. In: Descriptive and Normative Approaches To Human Behavior, pp. 303–319. World Scientific (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Theoretical Neuroscience, Columbia University, New York, USA
Baihan Lin
Zuckerman Mind Brain Behavior Institute, Columbia University, New York, USA
Baihan Lin
Department of Applied Mathematics, University of Washington, Seattle, USA
Baihan Lin

Authors

Baihan Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baihan Lin .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
An Zeng
Guangdong Construction Polytechnic, Guangzhou, China
Dan Pan
South China Normal University, Guangzhou, China
Tianyong Hao
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Daoqiang Zhang
University of Notre Dame, South Bend, IN, USA
Yiyu Shi
Simon Fraser University, Vancouver, BC, Canada
Xiaowei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, B. (2019). Neural Networks as Model Selection with Incremental MDL Normalization. In: Zeng, A., Pan, D., Hao, T., Zhang, D., Shi, Y., Song, X. (eds) Human Brain and Artificial Intelligence. HBAI 2019. Communications in Computer and Information Science, vol 1072. Springer, Singapore. https://doi.org/10.1007/978-981-15-1398-5_14

Download citation

DOI: https://doi.org/10.1007/978-981-15-1398-5_14
Published: 10 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1397-8
Online ISBN: 978-981-15-1398-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics