Abstract
After Minsky and Papert (Perceptrons, MIT Press, Cambridge, 1969) showed the inability of perceptrons in solving nonlinearly separable problems, for several decades people misinterpreted it as an inherent weakness that is common to all single-layer neural networks. The introduction of the backpropagation algorithm reinforced this misinterpretation as its success in solving nonlinearly separable problems passed through the training of multilayer neural networks. Recently, Conaway and Kurtz (Neural Comput 29(3):861–866, 2017) proposed a single-layer network in which the number of output units for each class is the same as input units and showed that it could solve some nonlinearly separable problems. They used the MSE (Mean Square Error) between the input units and the output units of the actual class as the objective function for training the network. They showed that their method could solve the XOR and M&S’81 problems, but it could not do any better than random guessing on the 3-bit parity problem. In this paper, we use a soft competitive approach to generalize the CE (Cross-Entropy) loss, which is a widely accepted criterion for multiclass classification, to networks that have several output units for each class, calling the resulting measure the CCE (Competitive cross-entropy) loss. In contrast to Conaway and Kurtz (2017), in our method, the number of output units for each class can be chosen arbitrarily. We show that the proposed method can successfully solve the 3-bit parity problem, in addition to the XOR and M&S’81 problems. Furthermore, we perform experiments on several datasets for multiclass classification, comparing a single-layer network trained with the proposed CCE loss against LVQ, linear SVM, a single-layer network trained with the CE loss, and the method of Conaway and Kurtz (2017). The results show that the CCE loss performs remarkably better than existing algorithms for training single-layer neural networks.
Similar content being viewed by others
Notes
These datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets.
Note that while in this experiment the proposed method has 60 output neurons, the number of output neurons for the method of Conaway and Kurtz [4] is 7840.
References
Bagarello F, Cinà M, Gargano F (2017) Projector operators in clustering. Math Methods Appl Sci 40(1):49–59
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Chan TH, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) Pcanet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032
Conaway N, Kurtz KJ (2017) Solving nonlinearly separable classifications in a single-layer neural network. Neural Comput 29(3):861–866
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(Mar):551–585
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(Aug):1871–1874
Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Kohonen T (1995) Learning vector quantization. In: Self-organizing maps. Springer, pp 175–189
Kohonen T, Hynninen J, Kangas J, Laaksonen J, Torkkola K (1996) Lvq pak: the learning vector quantization program package. Tech. rep., Technical report, Laboratory of Computer and Information Science Rakentajanaukio 2 C, 1991–1992
Martín-del Brío B (1996) A dot product neuron for hardware implementation of competitive networks. IEEE Trans Neural Netw 7(2):529–532
Medin DL, Schwanenflugel PJ (1981) Linear separability in classification learning. J Exp Psychol Hum Learn Mem 7(5):355
Mensink T, Verbeek J, Perronnin F, Csurka G (2013) Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Trans Pattern Anal Mach Intell 35(11):2624–2637
Minsky M, Papert S (1969) Perceptrons. MIT Press, Cambridge
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–538
Siomau M (2014) A quantum model for autonomous learning automata. Quantum Inf Process 13(5):1211–1221
Urcid G, Ritter GX, Iancu L (2004) Single layer morphological perceptron solution to the n-bit parity problem. In: Iberoamerican congress on pattern recognition, Springer, pp 171–178
Zhu G, Lin L, Jiang Y (2017) Resolve xor problem in a single layer neural network. In: IWACIII 2017-5th international workshop on advanced computational intelligence and intelligent informatics, Fuji Technology Press Ltd
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghiasi-Shirazi, K. Competitive Cross-Entropy Loss: A Study on Training Single-Layer Neural Networks for Solving Nonlinearly Separable Classification Problems. Neural Process Lett 50, 1115–1122 (2019). https://doi.org/10.1007/s11063-018-9906-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-018-9906-5