Skip to main content

Centering Neural Network Gradient Factors

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

Abstract

It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [15]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network’s generalization ability.

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J., Rosenfeld, E. (eds.): Neurocomputing: Foundations of Research. MIT Press, Cambridge (1988)

    Google Scholar 

  2. Battiti, R.: Accelerated back-propagation learning: Two optimization methods. Complex Systems 3, 331–342 (1989)

    MATH  Google Scholar 

  3. Battiti, R.: First- and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation 4(2), 141–166 (1992)

    Article  Google Scholar 

  4. Bienenstock, E., Cooper, L., Munro, P.: Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience 2 (1982); Reprinted in [1]

    Google Scholar 

  5. Deterding, D.H.: Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge (1989)

    Google Scholar 

  6. Finke, M., Müller, K.-R.: Estimating a-posteriori probabilities using stochastic network models. In: Mozer, M.C., Smolensky, P., Touretzky, D.S., Elman, J.L., Weigend, A.S. (eds.) Proceedings of the 1993 Connectionist Models Summer School, Boulder, CO. Lawrence Erlbaum Associates, Hillsdale (1994)

    Google Scholar 

  7. Hastie, T.J., Tibshirani, R.J.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 607–616 (1996)

    Article  Google Scholar 

  8. Herrmann, M.: On the merits of topography in neural maps. In: Kohonen, T. (ed.) Proceedings of the Workshop on Self-Organizing Maps, pp. 112–117. Helsinki University of Technology (1997)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Feature extraction through lococode. Neural Computation (1998) (to appear)

    Google Scholar 

  10. Intrator, N.: Feature extraction using an unsupervised neural network. Neural Computation 4(1), 98–107 (1992)

    Article  MathSciNet  Google Scholar 

  11. Lapedes, A., Farber, R.: A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica D 22, 247–259 (1986)

    Article  MathSciNet  Google Scholar 

  12. LeCun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters 66(18), 2396–2399 (1991)

    Article  Google Scholar 

  13. Robinson, A.J.: Dynamic Error Propagation Networks. PhD thesis, University of Cambridge (1989)

    Google Scholar 

  14. Schraudolph, N.N., Sejnowski, T.J.: Unsupervised discrimination of clustered data via optimization of binary information gain. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems, vol. 5, pp. 499–506. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  15. Schraudolph, N.N., Sejnowski, T.J.: Tempering backpropagation networks: Not all weights are created equal. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 563–569. MIT Press, Cambridge (1996)

    Google Scholar 

  16. Sejnowski, T.J.: Storing covariance with nonlinearly interacting neurons. Journal of Mathematical Biology 4, 303–321 (1977)

    Article  Google Scholar 

  17. Shah, S., Palmieri, F., Datum, M.: Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Networks 5, 779–787 (1992)

    Article  Google Scholar 

  18. Tenenbaum, J.B., Freeman, W.T.: Separating style and content. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 662–668. The MIT Press, Cambridge (1997)

    Google Scholar 

  19. Turney, P.D.: Exploiting Context When Learning to Classify. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 402–407. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  20. Turney, P.D.: Robust classification with context-sensitive features. In: Proceedings of the Sixth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pp. 268–276 (1993)

    Google Scholar 

  21. Vogl, T.P., Mangis, J.K., Rigler, A.K., Zink, W.T., Alkon, D.L.: Accelerating the convergence of the back-propagation method. Biological Cybernetics 59, 257–263 (1988)

    Article  Google Scholar 

  22. Widrow, B., McCool, J.M., Larimore, M.G., Johnson Jr., C.R.: Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proceedings of the IEEE 64(8), 1151–1162 (1976)

    Article  MathSciNet  Google Scholar 

  23. Zimmermann, H.G.: Neuronale Netze als Entscheidungskalkül. In: Rehkugler, H., Zimmermann, H.G. (eds.) Neuronale Netze in der Ökonomie: Grundlagen und finanzwirtschaftliche Anwendungen, pp. 1–87. Vahlen Verlag, Munich (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schraudolph, N.N. (2012). Centering Neural Network Gradient Factors. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35289-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35288-1

  • Online ISBN: 978-3-642-35289-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics