Learning criteria for training neural network classifiers

Zhou, P.; Austin, J.

doi:10.1007/BF01428124

Learning criteria for training neural network classifiers

Articles
Published: December 1998

Volume 7, pages 334–342, (1998)
Cite this article

Neural Computing & Applications Aims and scope Submit manuscript

P. Zhou¹ &
J. Austin¹

441 Accesses
30 Citations
Explore all metrics

Abstract

This paper presents a study of two learning criteria and two approaches to using them for training neural network classifiers, specifically a Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) networks. The first approach, which is a traditional one, relies on the use of two popular learning criteria, i.e. learning via minimising a Mean Squared Error (MSE) function or a Cross Entropy (CE) function. It is shown that the two criteria have different charcteristics in learning speed and outlier effects, and that this approach does not necessarily result in a minimal classification error. To be suitable for classification tasks, in our second approach an empirical classification criterion is introduced for the testing process while using the MSE or CE function for the training. Experimental results on several benchmarks indicate that the second approach, compared with the first, leads to an improved generalisation performance, and that the use of the CE function, compared with the MSE function, gives a faster training speed and improved or equal generalisation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Abbreviations

x :: random input vector withd real number components [x ₁ ...x _d]
t :: random target vector withc binary components [t ₁ ...t _c]
y(·):: neural network function or output vector
θ :: parameters of a neural model
η :: learning rate
α :: momentum
γ :: decay factor
O :: objective function
E :: mean sum-of-squares error function
L :: cross entropy function
n :: nth training pattern
N :: number of training patterns
ϕ(·):: transfer function in a neural unit
z _j :: output of hidden unit-j
a _i :: activation of unit-j
W _ij :: weight from hidden unit-j to output unit-i
W ⁰_jl :: weight from input unit-l to hidden unit-j
μ _j :: centre vector [μ_{j 1} ... μ _jd] of RBF unit-j
σ _j :: width vector [σ_{j 1}, ...σ _jd] of RBF unit-j
p( ·¦·):: conditional probability function

References

Juang B, Katagiri S. Discriminative learning for minimum error classification. IEEE Trans Signal Processing 1992; 40(12): 3043–3053.
Google Scholar
Nedljkovic V. A novel multilayer neural networks training algorithm that minimises the probability of classification errors. IEEE Trans Neural Networks 1993; 4(4): 650–659
Google Scholar
Miller D, Rao AV, Rose K, Gersho A. A global optimisation technique for statistical classier design. IEEE Trans Signal Processing 1996; 44(12): 3108–3122.
Google Scholar
Telfer BA, Szu HH. Energy function for minimising misclassification error with minimum-complexity networks. Neural Networks 1994; 7 (5): 809–818.
Google Scholar
Hampshire JB, Waibel AH. A novel objective function for improved phoneme recognition using time-delay neural networks. IEEE Trans Neural Networks 1990; 1(2): 2160–2228
Google Scholar
Hey H. On the probabilistic interpretation of neural network classifiers and discriminative training criteria. IEEE Trans Pattern Analysis and Machine Intelligence 1995; 17(2): 107–119
Google Scholar
Bishop CM. Neural networks for pattern recognition. Clarendon Press, Oxford, 1995
Google Scholar
Bridle JS. Probabilistic interpretation of feedforward classification networks outputs, with relationships to statistical pattern recognition. In: Soulie F, J. Herault J (eds) Neuro-computing: Algorithms, architectures and applications, Springer-Verlag 1990, pp. 227–236
Bridle JS. Training stochastic model recognition algorithm. In: Touretzky D (ed), Advances in Neural Information Processing Systems, Vol 2, MIT Press, 1990, pp. 211–217
Bichsel M, Seitz P. Minimum class entropy: a maximum information approach to layered neural networks. Neural Networks 1989; 2(2): 133–141
Google Scholar
Solla SA, Levin E, Fleisher M. Accelerate learning in layered neural networks. Complex Systems 1988; 22: 625–640
Google Scholar
Holt MJJ, Semnani S. Convergence of back-propagation in neural networks using a log-likelihood function. Electronic Letters 1990; 26(23): 1965–1965
Google Scholar
The Carnegie Mellon University Collection of Neural Net Benchmarks, from http://www.cs.cmu.edu:80/afs/cs/project/connect/bench/, or from ftp.cs.cmu.edu —afs/cs/project/connect/bench/
University of California, Irvine, UCI Repository of Machine Learning Databases, from http://www.ics.uci. edu/~mlearn/MLOther.html, or from ftp.ics.uci.edu — pub/machine-learning-databases
Deterding D. Speaker normalisation for automatic speech recognition. PhD Thesis, University of Cambridge, 1989
Gorman RP, Sejnowski TJ. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1988; 1(1): 75–89
Google Scholar
Sigillito VG, Wing SP, Hutton LV, Baker KB. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest 1989; 10: 262–266
Google Scholar
Michie D, Spiegelhalter DJ, Taylor CC. Machine learning, neural and statistical classification, Ellis Horwood, 1994
Moody J, Darken CJ. Fast learning in networks of locally tuned processing units. Neural Computation 1989; 1(2): 281–294
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Computer Architecture Group, Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK
P. Zhou & J. Austin

Authors

P. Zhou
View author publications
You can also search for this author inPubMed Google Scholar
J. Austin
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, P., Austin, J. Learning criteria for training neural network classifiers. Neural Comput & Applic 7, 334–342 (1998). https://doi.org/10.1007/BF01428124

Download citation

Issue Date: December 1998
DOI: https://doi.org/10.1007/BF01428124

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning criteria for training neural network classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Regression Neural Networks with a Highly Robust Loss Function

A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks

Robust Training of Radial Basis Function Neural Networks

Abbreviations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Learning criteria for training neural network classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Regression Neural Networks with a Highly Robust Loss Function

A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks

Robust Training of Radial Basis Function Neural Networks

Explore related subjects

Abbreviations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now