Abstract
In this chapter a view about the complexity of loading problems is given by using an approach which is based on the evaluation of gradient-based heuristics. It has been shown that the complexity of the loading problem is strongly related to the problem at hand and several problems have been shown for which the error function has no suboptimal local minima.
In addition to the analysis of local minima, it has been proven that there are problems which turns out to be intractable when using gradient-based heuristics, simply because the heuristics itself vanishes regardless of the parameter initialization. The problem of capturing long-term dependencies is an example of hard problem, especially in the case of sequences. In the case of data structures this problem does not seem to be as serious as for sequences. This is a very important fact which suggests to look at many practical problems, where the decision process involves dynamic inputs, in a different perspective.
Preview
Unable to display preview. Download preview PDF.
References
L. Valiant, “A theory of the learnable,” Comm. Ass. Comp. Mach., vol. 27, no. 11, pp. 1134–1142, 1984.
E. Baum and D. Haussler, “What size net gives valid generalization?,” Neural Computation, vol. 1, no. 1, pp. 151–160, 1989.
J. Judd, Neural Network Design and the Complexity of Learning. Cambridge, London: The MIT Press, 1990.
A. Blum and R. Rivest, “Training a 3-node neural net is NP-complete,” in Advances in Neural Information Processing Systems (D. Touretzky, ed.), vol. 1, pp. 494–501, Morgan Kaufmann, San Mateo, 1989.
E. Baum, “book review,” IEEE Transactions on Neural Networks, vol. 2, pp. 181–182, January 1991. J.S. Judd, “Neural network design and the complexity of learning”.
M. Tomita, “Dynamic construction of finite-state automata from examples using hill-climbing,” in Proceedings of the Fourth Annual Cognitive Science Conference, (Ann Arbor MI), pp. 105–108, 1982.
M. Minsky and S. Papert, Perceptrons — Expanded Edition. Cambridge: MIT Press, 1988.
N. Nilsson, Learning Machines. New York: McGraw-Hill, 1965. Reissued as Mathematical Foundations of Learning Machines, Morgan Kaufmann Publishers, San Mateo, CA, 1990.
F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanism. Washington D.C.: Spartan Books, 1962.
P. Frasconi, M. Gori, M. Maggini, E. Martinelli, and G. Soda, “Inductive inference of tree automata by recursive neural networks,” in Lecture Notes in Artificial Intelligence (M. Lenzerini, ed.), pp. 36–47, Springer Verlag, 1997.
N. Nilsson, Principles of Artificial Intelligence. Palo Alto, (CA): Tioga, 1980.
E. Korf, “Iterative-deepening-a: An optimal admissible tree search,” in Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1034–1036, 1985.
H. White, “The learning rate in backpropagation systems: an application of newton's method,” in International Joint Conference on Neural Networks, vol. 1, pp. 689–684, 1991.
S. Wang and C. H. Hsu, “Terminal attractor learning algorithms for backpropagation neural networks,” in International Joint Conference on Neural Networks, (Singapore), pp. 183–189, IEEE Press, November 1991.
M. Bianchini, M. Gori, and M. Maggini, “Does terminal attractor backpropagation guarantee global optimization?,” in International Conference on Artificial Neural Networks, (Sorrento, Italy), May, 26–29, 1994.
P. Frasconi, S. Fanelli, M. Gori, and M. Protasi, “Suspiciousness of loading problems,” in Proceedings of the IEEE International Conference on Neural Networks, pp. II 1240–1245, IEEE Press, 9–12 June 1997.
D. G. Luenberger, Linear and Nonlinear Programming. Reading: Addison-Wesley, 1984. 2nd ed.
M. Brady, R. Raghavan, and J. Slawny, “Back-propagation fails to separate where perceptrons succeed,” IEEE Transactions on Circuits and Systems, vol. 36, pp. 665–674, 1989.
E. Sontag and H. Sussman, “Backpropagation separates when perceptrons do,” in International Joint Conference on Neural Networks, vol. 1, (Washington DC), pp. 639–642, IEEE Press, June 1989.
M. Gori and A. Tesi, “On the problem of local minima in backpropagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-14, pp. 76–86, January 1992.
P. Frasconi, M. Gori, and A. Tesi, “Backpropagation for linearly separable patterns: a detailed analysis,” in IEEE International Conference on Neural Networks, vol. 3, (San Francisco, (CA)), pp. 1818–1822, IEEE Press, March–April 1993.
E. Sontag and H. Sussman, “Backpropagation can give rise to spurious local minima even for networks without hidden layers,” Complex Systems, vol. 3, pp. 91–106, 1989.
M. Gori and A. Tesi, “Some examples of local minima during learning with back-propagation,” in Parallel Architectures and Neural Networks, (Vietri sul Mare, Italy), May 1990.
J. McInerny, K. Haines, S. Biafore, and R. Hecht-Nielsen, “Back propagation error surfaces can have local minima,” in International Joint Conference on Neural Networks, vol. 2, (Washington 1989), p. 627, IEEE, New York, 1989.
D. Rumelhart, J. McClelland, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. Cambridge: MIT Press, 1986.
P. Baldi and K. Hornik, “Neural networks and principal component analysis: Learning from examples without local minima,” Neural Networks, vol. 2, pp. 53–58, 1989.
X. Yu and G. Chen, “On the local minima free condition of backpropagation learning,” IEEE Transactions on Neural Networks, vol. 6, pp. 1300–1303, September 1995.
P. Frasconi, M. Gori, and A. Tesi, “Successes and failures of backpropagation: a theoretical investigation,” in Progress in Neural Networks (O. Omidvar and C. Wilson, eds.), ch. 8, pp. 205–242, Norwood, New Jersey: Ablex Publishing, 1997.
M. Bianchini and M. Gori, “Optimal learning in artificial neural networks: A review of theoretical results,” Neurocomputing, vol. 13, pp. 313–346, 1996.
M. Gori and M. Maggini, “Optimal convergence of on-line backpropagation,” IEEE Transactions on Neural Networks, vol. 7, pp. 251–253, January 1996.
M. Bianchini, P. Frasconi, and M. Gori, “Learning in multilayered networks used as autoassociators,” IEEE Transactions on Neural Networks, vol. 6, pp. 512–515, March 1995.
M. Bianchini, P. Frasconi, and M. Gori, “Learning without local minima in radial basis function networks,” IEEE, Transactions on Neural Networks, vol. 6, pp. 749–756, May 1995.
T. Cover, “Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,” IEEE Transactions on Electronic Computers, vol. 14, pp. 326–334, 1965.
T. Poston, C-Lee, Y. Choie, and Y. Kwon, “Local minima and backpropagation,” in International Joint Conference on Neural Networks, vol. 2, (Seattle, (WA)), pp. 173–176, IEEE Press, July 1991.
X. Yu, “Can backpropagation error surface not have local minima?,” IEEE Transactions on Neural Networks, vol. 3, pp. 1019–1020, November 1992.
L. Hamey, “Comments on can backpropagation error surface not have local minima,” IEEE Transactions on Neural Networks, vol. 5, pp. 884–845, September 1994.
M. Bianchini, M. Gori, and M. Maggini, “On the problem of local minima in recurrent neural networks,” IEEE Transactions on Neural Networks, vol. 5, pp. 167–177, March 1994. Special Issue on Recurrent Neural Networks.
P. Frasconi, M. Gori, and A. Sperduti, “Adaptive computation of data structures: theoretical foundations,” tech. rep., Dipartimento di Ingegneria dell'Informazione, Università di Siena, TR-2-98, Siena, IT, 1988.
P. Frasconi, M. Gori, and A. Sperduti, “Optimal learning of data structures,” in Proceedings of the IJCAI97, (Nagoya (Japan)), pp. 1066–1071, August 1997.
Y. Bengio, P. Frasconi, and P. Simard, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, pp. 157–166, March 1994. Special Issue on Recurrent Neural Networks.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gori, M. (1998). The loading problem: Topics in complexity. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053998
Download citation
DOI: https://doi.org/10.1007/BFb0053998
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64341-8
Online ISBN: 978-3-540-69752-7
eBook Packages: Springer Book Archive