Abstract
This paper presents a study of two learning criteria and two approaches to using them for training neural network classifiers, specifically a Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) networks. The first approach, which is a traditional one, relies on the use of two popular learning criteria, i.e. learning via minimising a Mean Squared Error (MSE) function or a Cross Entropy (CE) function. It is shown that the two criteria have different charcteristics in learning speed and outlier effects, and that this approach does not necessarily result in a minimal classification error. To be suitable for classification tasks, in our second approach an empirical classification criterion is introduced for the testing process while using the MSE or CE function for the training. Experimental results on several benchmarks indicate that the second approach, compared with the first, leads to an improved generalisation performance, and that the use of the CE function, compared with the MSE function, gives a faster training speed and improved or equal generalisation performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- x :
-
random input vector withd real number components [x 1 ...x d ]
- t :
-
random target vector withc binary components [t 1 ...t c ]
- y(·):
-
neural network function or output vector
- θ :
-
parameters of a neural model
- η :
-
learning rate
- α :
-
momentum
- γ :
-
decay factor
- O :
-
objective function
- E :
-
mean sum-of-squares error function
- L :
-
cross entropy function
- n :
-
nth training pattern
- N :
-
number of training patterns
- ϕ(·):
-
transfer function in a neural unit
- z j :
-
output of hidden unit-j
- a i :
-
activation of unit-j
- W ij :
-
weight from hidden unit-j to output unit-i
- W 0 jl :
-
weight from input unit-l to hidden unit-j
- μ j :
-
centre vector [μ j 1 ... μ jd ] of RBF unit-j
- σ j :
-
width vector [σ j 1, ...σ jd ] of RBF unit-j
- p( ·¦·):
-
conditional probability function
References
Juang B, Katagiri S. Discriminative learning for minimum error classification. IEEE Trans Signal Processing 1992; 40(12): 3043–3053.
Nedljkovic V. A novel multilayer neural networks training algorithm that minimises the probability of classification errors. IEEE Trans Neural Networks 1993; 4(4): 650–659
Miller D, Rao AV, Rose K, Gersho A. A global optimisation technique for statistical classier design. IEEE Trans Signal Processing 1996; 44(12): 3108–3122.
Telfer BA, Szu HH. Energy function for minimising misclassification error with minimum-complexity networks. Neural Networks 1994; 7 (5): 809–818.
Hampshire JB, Waibel AH. A novel objective function for improved phoneme recognition using time-delay neural networks. IEEE Trans Neural Networks 1990; 1(2): 2160–2228
Hey H. On the probabilistic interpretation of neural network classifiers and discriminative training criteria. IEEE Trans Pattern Analysis and Machine Intelligence 1995; 17(2): 107–119
Bishop CM. Neural networks for pattern recognition. Clarendon Press, Oxford, 1995
Bridle JS. Probabilistic interpretation of feedforward classification networks outputs, with relationships to statistical pattern recognition. In: Soulie F, J. Herault J (eds) Neuro-computing: Algorithms, architectures and applications, Springer-Verlag 1990, pp. 227–236
Bridle JS. Training stochastic model recognition algorithm. In: Touretzky D (ed), Advances in Neural Information Processing Systems, Vol 2, MIT Press, 1990, pp. 211–217
Bichsel M, Seitz P. Minimum class entropy: a maximum information approach to layered neural networks. Neural Networks 1989; 2(2): 133–141
Solla SA, Levin E, Fleisher M. Accelerate learning in layered neural networks. Complex Systems 1988; 22: 625–640
Holt MJJ, Semnani S. Convergence of back-propagation in neural networks using a log-likelihood function. Electronic Letters 1990; 26(23): 1965–1965
The Carnegie Mellon University Collection of Neural Net Benchmarks, from http://www.cs.cmu.edu:80/afs/cs/project/connect/bench/, or from ftp.cs.cmu.edu —afs/cs/project/connect/bench/
University of California, Irvine, UCI Repository of Machine Learning Databases, from http://www.ics.uci. edu/~mlearn/MLOther.html, or from ftp.ics.uci.edu — pub/machine-learning-databases
Deterding D. Speaker normalisation for automatic speech recognition. PhD Thesis, University of Cambridge, 1989
Gorman RP, Sejnowski TJ. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1988; 1(1): 75–89
Sigillito VG, Wing SP, Hutton LV, Baker KB. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest 1989; 10: 262–266
Michie D, Spiegelhalter DJ, Taylor CC. Machine learning, neural and statistical classification, Ellis Horwood, 1994
Moody J, Darken CJ. Fast learning in networks of locally tuned processing units. Neural Computation 1989; 1(2): 281–294
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zhou, P., Austin, J. Learning criteria for training neural network classifiers. Neural Comput & Applic 7, 334–342 (1998). https://doi.org/10.1007/BF01428124
Issue Date:
DOI: https://doi.org/10.1007/BF01428124