Elsevier

Neurocomputing

Volume 13, Issues 2–4, October 1996, Pages 375-383
Neurocomputing

Letter
Improved binary classification performance using an information theoretic criterion

https://doi.org/10.1016/0925-2312(96)00025-2Get rights and content

Abstract

Feedforward neural networks trained to solve classification problems define an approximation of the conditional probabilities P(Ci,|x) if the output units correspond to categories Ci. The present paper shows that if a least mean squared error cost function is minimised during training phase, the resulting approximation of the P(Ci|x)s is poor in the ranges of the input variable x where the conditional probabilities take on very low values. The use of the Kullback-Leibler distance measure is proposed to overcome this limitation; a cost function derived from this information theoretic measure is defined and a computationally light training procedure is derived in the case of binary classification problems. The effectiveness of the proposed procedure is verified by means of comparative experiments.

References (16)

  • S.I. Amari

    Backpropagation and stochastic gradient descent method

    Neurocomputing

    (1993)
  • S. Amari et al.

    Statistical theory of learning curves under entropic loss criterion

    Neural Comput.

    (1993)
  • R. Battiti

    Using mutual information for selecting features in supervised neural net learning

    IEEE Trans. Neural Networks

    (1994)
  • P. Burrascano

    A norm selection criterion for the generalized delta rule

    IEEE Trans. on Neural Networks

    (1991)
  • P. Burrascano

    Network topology, training set size and generalization ability in MLP's project

  • P. Burrascano et al.

    Robust learning in the presence of outliers

  • D.S. Chen et al.

    A robust back propagation learning algorithm for function approximation

    IEEE Trans. Neural Networks

    (1994)
  • F. Kanaya et al.

    Bayes statistical behavior and valid generalization of pattern classifying neural network

    IEEE Trans. Neural Networks

    (1991)
There are more references available in the full text version of this article.

Cited by (0)

View full text