Abstract
The paper reconsiders multilayer perceptron networks for the case where the Euclidean inner product is replaced by a semi-inner product. This would be of interest, if the dissimilarity measure between data is given by a general norm such that the Euclidean inner product is not longer consistent to that situation. We prove mathematically that the universal approximation completeness is guaranteed also for those networks where the used semi-inner products are related either to uniformly convex or to reflexive Banach-spaces. Most famous examples of uniformly convex Banach spaces are the spaces \(L_{p}\) and \(l_{p}\) for \(1<p<\infty \). The result is valid for all discriminatory activation functions including the sigmoid and the ReLU activation.
A. Engelsberger—Supported by an ESF PhD grant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bishop, C.: Pattern Recognition and Machine Learning. Springer, London (2006)
Braun, J., Griebel, M.: On a constructive proof of Kolmogorov’s superposition theorem. Constr. Approx. 30, 653–675 (2009). https://doi.org/10.1007/s00365-009-9054-2
Chieng, H., Wahid, N., Pauline, O., Perla, S.: Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning. Int. J. Adv. Intell. Inform. 4(2), 76–86 (2018)
Clarkson, J.: Uniformly convex spaces. Trans. Am. Math. Soc. 40, 396–414 (1936)
Cybenko, G.: Approximations by superpositions of a sigmoidal function. Math. Control Sig. Syst. 2(4), 303–314 (1989). https://doi.org/10.1007/BF02551274
Faulkner, G.D.: Representation of linear functionals in a Banach space. Rocky Mt. J. Math. 7(4), 789–792 (1977)
Giles, J.: Classes of semi-inner-product spaces. Trans. Am. Math. Soc. 129, 436–446 (1967)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Gorban, A.: Approximation of continuous functions of several variables by an arbitrary nonlinear continuous function of one variable, linear functions, and their superpositions. Appl. Math. Lett. 11(3), 45–49 (1998)
Guilhoto, L.: An overview of artificial neural networks for mathematicians (2018). http://math.uchicago.edu/~may/REU2018/REUPapers/Guilhoto.pdf
Hanin, B.: Universal function approximation by deep neural networks with bounded width and ReLU activations. Mathematics 7(992), 1–9 (2019)
Hanner, O.: On the uniform convexity of \(L^p\) and \(l^p\). Ark. Mat. 3(19), 239–244 (1956)
Hertz, J.A., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation, Volume 1 of Santa Fe Institute Studies in the Sciences of Complexity: Lecture Notes. Addison-Wesley, Redwood City (1991)
Kolmogorov, A.: On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition. Doklady Academ Nauk SSSR 114(5), 953–956 (1957)
Kolmogorov, A., Fomin, S.: Reelle Funktionen und Funktionalanalysis. VEB Deutscher Verlag der Wissenschaften, Berlin (1975)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), San Diego, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012)
Kůrková, V.: Kolmogorov’s theorem and multilayer neural networks. Neural Netw. 5, 501–506 (1992)
Lange, M., Biehl, M., Villmann, T.: Non-Euclidean principal component analysis by Hebbian learning. Neurocomputing 147, 107–119 (2015)
LeCun, Y., Cortes, C., Burges, C.: The MNIST database (1998)
Lumer, G.: Semi-inner-product spaces. Trans. Am. Math. Soc. 100, 29–43 (1961)
Nath, B.: Topologies on generalized semi-inner product spaces. Composito Mathematica 23(3), 309–316 (1971)
Ramachandran, P., Zoph, B., Le, Q.: Searching for activation functions. Technical report, Google Brain (2018). arXiv:1710.05941v1
Riesz, F., Nagy, B.Sz.: Vorlesungen über Functionalanalysis, 4th edn. Verlag Harri Deutsch, Frankfurt/M. (1982)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)
Rudin, W.: Functional Analysis, 2nd edn. MacGraw-Hill Inc., New York (1991)
Steinwart, I., Christmann, A.: Support Vector Machines. Information Science and Statistics, Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-77242-4
Triebel, H.: Analysis und mathematische Physik, 3rd revised edn. BSB B.G. Teubner Verlagsgesellschaft, Leipzig (1989)
Villmann, T., Haase, S., Kaden, M.: Kernelized vector quantization in gradient-descent learning. Neurocomputing 147, 83–95 (2015)
Villmann, T., Ravichandran, J., Villmann, A., Nebel, D., Kaden, M.: Investigation of activation functions for generalized learning vector quantization. In: Vellido, A., Gibert, K., Angulo, C., Martín Guerrero, J.D. (eds.) WSOM 2019. AISC, vol. 976, pp. 179–188. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19642-4_18
Zhang, H., Xu, Y., Zhang, J.: Reproducing kernel banach spaces for machine learning. J. Mach. Learn. Res. 10, 2741–2775 (2009)
Zhang, H., Zhang, J.: Generalized semi-inner products with applications to regularized learning. J. Math. Anal. Appl. 372, 181–196 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
In this appendix we give some useful definitions regarding SIPs and Banach spaces, which are used in the text as well as some basic statements and remarks.
Definition 23
A Banach space \(\mathcal {B}\) is denoted as strictly convex iff for \(\textbf{x},\textbf{y}\ne 0\) with \(\left\| \textbf{x}\right\| +\left\| \textbf{y}\right\| =\left\| \textbf{x}+\textbf{y}\right\| \) we can always conclude that \(\textbf{x}=\lambda \textbf{y}\) for some \(\lambda >0\).
Lemma 24
A Banach space \(\mathcal {B}\) with SIP \(\left[ \cdot ,\cdot \right] \) is strictly convex iff for \(\textbf{x},\textbf{y}\ne 0\) with \(\left[ \textbf{x},\textbf{y}\right] =\left\| \textbf{x}\right\| \cdot \left\| \textbf{y}\right\| \) we can always conclude that \(\textbf{x}=\lambda \textbf{y}\) for some \(\lambda >0\).
Proof
The proof can be found in [7]. \(\square \)
The following definition for the uniform convexity was introduced in [4]:
Definition 25
A Banach space \(\mathcal {B}\) is denoted as uniformly convex iff for each \(\varepsilon >0\) exists a \(\delta \left( \varepsilon \right) >0\) such that if \(\left\| \textbf{x}\right\| =\left\| \textbf{y}\right\| =1\) with \(\left\| \textbf{x}-\textbf{y}\right\| >\varepsilon \) then \(\frac{\left\| \left( \textbf{x}+\textbf{y}\right) \right\| }{2}<1-\delta \left( \varepsilon \right) \) is valid.
Definition 26
A Banach space \(\mathcal {B}\) with SIP \(\left[ \cdot ,\cdot \right] \) is denoted as continuous iff
is valid for \(\lambda \in \mathbb {R}\). The space is uniformly continuous iff this limit is approached uniformly.
Definition 27
A Banach space \(\mathcal {B}\) is denoted as reflexive iff the mapping \(J:\mathcal {B}\rightarrow \mathcal {B}^{**}=\left( \mathcal {B}^{*}\right) ^{*}\) is surjective, where the star indicates the dual space.
Theorem 28
Let \(\mathcal {B}\) be a Banach space. Then a necessary and sufficient condition for \(\mathcal {B}\) to be reflexive is that for every \(f\in \mathcal {B}^{*}\) exists an SIP \(\left[ \cdot ,\cdot \right] \) and an element \(\textbf{y}\in \mathcal {B}\) with \(f\left( \textbf{x}\right) =\left[ \textbf{x},\textbf{y}\right] \) for all \(\textbf{x}\in \mathcal {B}\). If \(\mathcal {B}\) is strictly convex then \(\textbf{y}\) is unique.
Proof
The proof can be found in [6, Theorem 2]. \(\square \)
Definition 29
A Banach space \(\mathcal {B}\) is denoted as smooth iff for each \(\textbf{x}\in \mathcal {B}\) with \(\left\| \textbf{x}\right\| =1\) there exists a linear functional \(f_{\textbf{x}}\in \mathcal {B}^{*}\) with \(f_{\textbf{x}}\left( \textbf{x}\right) =\left\| f_{\textbf{x}}\right\| \). The existence of \(f_{\textbf{x}}\) is guaranteed by the Hahn-Banach-Theorem.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Villmann, T., Engelsberger, A. (2023). Multilayer Perceptrons with Banach-Like Perceptrons Based on Semi-inner Products – About Approximation Completeness. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2022. Lecture Notes in Computer Science(), vol 13588. Springer, Cham. https://doi.org/10.1007/978-3-031-23492-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-23492-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23491-0
Online ISBN: 978-3-031-23492-7
eBook Packages: Computer ScienceComputer Science (R0)