Abstract
The minimisation of a least mean squares cost function produces poor results in the ranges of the input variable where the quantity to be approximated takes on relatively low values. This can be a problem if an accurate approximation is required in a wide dynamic range. The present paper approaches this problem in the case of multilayer perceptrons trained to approximate the posterior conditional probabilities in a multicategory classification problem. The use of a cost function derived from the Kullback–Leibler information distance measure is proposed and a computationally light algorithm is derived for its minimisation. The effectiveness of the procedure is experimentally verified.
Similar content being viewed by others
References
P. Burrascano, “A norm selection criterion for thegeneralized delta rule”, IEEE Trans. Neural Networks, Vol. 2, No. 1, pp. 125–130, 1991.
S.I. Amari, “Backpropagationand stochastic gradient descent method”, Neurocomputing, Vol. 5, Nos. 4–5, pp. 185–196, 1993.
F. Kanaya and S. Miyake,“Bayes statistical behavior and valid generalization of pattern classifying neural network”, IEEE Trans. Neural Networks, Vol. 2, No. 4, pp. 471–475, 1991.
S. Miyake and F. Kanaya, “A neuralnetwork approach to a Bayesian statistical decision problem”, IEEE Trans. Neural Networks, Vol. 2, No. 5, pp. 538–540, 1991.
D.W. Ruck, S.K. Rogers, M. Kabrisky, M.E. Oxley, B.W. Suter, “The multilayer perceptron as an approximation to a Bayes optimal discriminant function”, IEEE Trans. Neural Networks, Vol. 1, No. 4, pp. 296–298, 1990.
E.A. Wan, “Neural network classification:a Bayesian interpretation”, IEEE Trans. Neural Networks, Vol. 1, No. 4, pp. 303–305, 1990.
D.E. Rumelhart, G.E. Hinton and R.J. Williams, “Learning internal representations by error propagation”, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, Ch. 8, pp. 318–362, MIT Press: Cambridge, MA, 1986.
S. Kullback, Information Theory and Statistics,Wiley: New York, 1959.
P. Burrascano and D. Pirollo,“Improved binary classification performance using an information theoretic criterion”, Neurocomputing, Vol. 13, Nos. 2–4, 1996.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Battisti, M., Burrascano, P. & Pirollo, D. Efficient Minimisation of the KL Distance for the Approximation of Posterior Conditional Probabilities. Neural Processing Letters 5, 47–55 (1997). https://doi.org/10.1023/A:1009605310499
Issue Date:
DOI: https://doi.org/10.1023/A:1009605310499