Skip to main content

Multilayer Perceptrons with Banach-Like Perceptrons Based on Semi-inner Products – About Approximation Completeness

  • Conference paper
  • First Online:
Book cover Artificial Intelligence and Soft Computing (ICAISC 2022)

Abstract

The paper reconsiders multilayer perceptron networks for the case where the Euclidean inner product is replaced by a semi-inner product. This would be of interest, if the dissimilarity measure between data is given by a general norm such that the Euclidean inner product is not longer consistent to that situation. We prove mathematically that the universal approximation completeness is guaranteed also for those networks where the used semi-inner products are related either to uniformly convex or to reflexive Banach-spaces. Most famous examples of uniformly convex Banach spaces are the spaces \(L_{p}\) and \(l_{p}\) for \(1<p<\infty \). The result is valid for all discriminatory activation functions including the sigmoid and the ReLU activation.

A. Engelsberger—Supported by an ESF PhD grant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bishop, C.: Pattern Recognition and Machine Learning. Springer, London (2006)

    MATH  Google Scholar 

  2. Braun, J., Griebel, M.: On a constructive proof of Kolmogorov’s superposition theorem. Constr. Approx. 30, 653–675 (2009). https://doi.org/10.1007/s00365-009-9054-2

    Article  MathSciNet  MATH  Google Scholar 

  3. Chieng, H., Wahid, N., Pauline, O., Perla, S.: Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning. Int. J. Adv. Intell. Inform. 4(2), 76–86 (2018)

    Article  Google Scholar 

  4. Clarkson, J.: Uniformly convex spaces. Trans. Am. Math. Soc. 40, 396–414 (1936)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cybenko, G.: Approximations by superpositions of a sigmoidal function. Math. Control Sig. Syst. 2(4), 303–314 (1989). https://doi.org/10.1007/BF02551274

    Article  MathSciNet  MATH  Google Scholar 

  6. Faulkner, G.D.: Representation of linear functionals in a Banach space. Rocky Mt. J. Math. 7(4), 789–792 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  7. Giles, J.: Classes of semi-inner-product spaces. Trans. Am. Math. Soc. 129, 436–446 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  8. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  9. Gorban, A.: Approximation of continuous functions of several variables by an arbitrary nonlinear continuous function of one variable, linear functions, and their superpositions. Appl. Math. Lett. 11(3), 45–49 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Guilhoto, L.: An overview of artificial neural networks for mathematicians (2018). http://math.uchicago.edu/~may/REU2018/REUPapers/Guilhoto.pdf

  11. Hanin, B.: Universal function approximation by deep neural networks with bounded width and ReLU activations. Mathematics 7(992), 1–9 (2019)

    Google Scholar 

  12. Hanner, O.: On the uniform convexity of \(L^p\) and \(l^p\). Ark. Mat. 3(19), 239–244 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hertz, J.A., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation, Volume 1 of Santa Fe Institute Studies in the Sciences of Complexity: Lecture Notes. Addison-Wesley, Redwood City (1991)

    Google Scholar 

  14. Kolmogorov, A.: On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition. Doklady Academ Nauk SSSR 114(5), 953–956 (1957)

    MATH  Google Scholar 

  15. Kolmogorov, A., Fomin, S.: Reelle Funktionen und Funktionalanalysis. VEB Deutscher Verlag der Wissenschaften, Berlin (1975)

    MATH  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), San Diego, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012)

    Google Scholar 

  17. Kůrková, V.: Kolmogorov’s theorem and multilayer neural networks. Neural Netw. 5, 501–506 (1992)

    Article  Google Scholar 

  18. Lange, M., Biehl, M., Villmann, T.: Non-Euclidean principal component analysis by Hebbian learning. Neurocomputing 147, 107–119 (2015)

    Article  Google Scholar 

  19. LeCun, Y., Cortes, C., Burges, C.: The MNIST database (1998)

    Google Scholar 

  20. Lumer, G.: Semi-inner-product spaces. Trans. Am. Math. Soc. 100, 29–43 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nath, B.: Topologies on generalized semi-inner product spaces. Composito Mathematica 23(3), 309–316 (1971)

    MathSciNet  MATH  Google Scholar 

  22. Ramachandran, P., Zoph, B., Le, Q.: Searching for activation functions. Technical report, Google Brain (2018). arXiv:1710.05941v1

  23. Riesz, F., Nagy, B.Sz.: Vorlesungen über Functionalanalysis, 4th edn. Verlag Harri Deutsch, Frankfurt/M. (1982)

    Google Scholar 

  24. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)

    Article  Google Scholar 

  25. Rudin, W.: Functional Analysis, 2nd edn. MacGraw-Hill Inc., New York (1991)

    MATH  Google Scholar 

  26. Steinwart, I., Christmann, A.: Support Vector Machines. Information Science and Statistics, Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-77242-4

    Book  MATH  Google Scholar 

  27. Triebel, H.: Analysis und mathematische Physik, 3rd revised edn. BSB B.G. Teubner Verlagsgesellschaft, Leipzig (1989)

    Google Scholar 

  28. Villmann, T., Haase, S., Kaden, M.: Kernelized vector quantization in gradient-descent learning. Neurocomputing 147, 83–95 (2015)

    Article  Google Scholar 

  29. Villmann, T., Ravichandran, J., Villmann, A., Nebel, D., Kaden, M.: Investigation of activation functions for generalized learning vector quantization. In: Vellido, A., Gibert, K., Angulo, C., Martín Guerrero, J.D. (eds.) WSOM 2019. AISC, vol. 976, pp. 179–188. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19642-4_18

    Chapter  Google Scholar 

  30. Zhang, H., Xu, Y., Zhang, J.: Reproducing kernel banach spaces for machine learning. J. Mach. Learn. Res. 10, 2741–2775 (2009)

    MathSciNet  MATH  Google Scholar 

  31. Zhang, H., Zhang, J.: Generalized semi-inner products with applications to regularized learning. J. Math. Anal. Appl. 372, 181–196 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Villmann .

Editor information

Editors and Affiliations

Appendix

Appendix

In this appendix we give some useful definitions regarding SIPs and Banach spaces, which are used in the text as well as some basic statements and remarks.

Definition 23

A Banach space \(\mathcal {B}\) is denoted as strictly convex iff for \(\textbf{x},\textbf{y}\ne 0\) with \(\left\| \textbf{x}\right\| +\left\| \textbf{y}\right\| =\left\| \textbf{x}+\textbf{y}\right\| \) we can always conclude that \(\textbf{x}=\lambda \textbf{y}\) for some \(\lambda >0\).

Lemma 24

A Banach space \(\mathcal {B}\) with SIP \(\left[ \cdot ,\cdot \right] \) is strictly convex iff for \(\textbf{x},\textbf{y}\ne 0\) with \(\left[ \textbf{x},\textbf{y}\right] =\left\| \textbf{x}\right\| \cdot \left\| \textbf{y}\right\| \) we can always conclude that \(\textbf{x}=\lambda \textbf{y}\) for some \(\lambda >0\).

Proof

The proof can be found in [7].    \(\square \)

The following definition for the uniform convexity was introduced in [4]:

Definition 25

A Banach space \(\mathcal {B}\) is denoted as uniformly convex iff for each \(\varepsilon >0\) exists a \(\delta \left( \varepsilon \right) >0\) such that if \(\left\| \textbf{x}\right\| =\left\| \textbf{y}\right\| =1\) with \(\left\| \textbf{x}-\textbf{y}\right\| >\varepsilon \) then \(\frac{\left\| \left( \textbf{x}+\textbf{y}\right) \right\| }{2}<1-\delta \left( \varepsilon \right) \) is valid.

Definition 26

A Banach space \(\mathcal {B}\) with SIP \(\left[ \cdot ,\cdot \right] \) is denoted as continuous iff

$$ \Re \left\{ \left[ \textbf{x},\textbf{y}+\lambda \textbf{x}\right] \right\} \underset{\lambda \rightarrow 0}{\longrightarrow }\Re \left\{ \left[ \textbf{x},\textbf{y}\right] \right\} $$

is valid for \(\lambda \in \mathbb {R}\). The space is uniformly continuous iff this limit is approached uniformly.

Definition 27

A Banach space \(\mathcal {B}\) is denoted as reflexive iff the mapping \(J:\mathcal {B}\rightarrow \mathcal {B}^{**}=\left( \mathcal {B}^{*}\right) ^{*}\) is surjective, where the star indicates the dual space.

Theorem 28

Let \(\mathcal {B}\) be a Banach space. Then a necessary and sufficient condition for \(\mathcal {B}\) to be reflexive is that for every \(f\in \mathcal {B}^{*}\) exists an SIP \(\left[ \cdot ,\cdot \right] \) and an element \(\textbf{y}\in \mathcal {B}\) with \(f\left( \textbf{x}\right) =\left[ \textbf{x},\textbf{y}\right] \) for all \(\textbf{x}\in \mathcal {B}\). If \(\mathcal {B}\) is strictly convex then \(\textbf{y}\) is unique.

Proof

The proof can be found in [6, Theorem 2].    \(\square \)

Definition 29

A Banach space \(\mathcal {B}\) is denoted as smooth iff for each \(\textbf{x}\in \mathcal {B}\) with \(\left\| \textbf{x}\right\| =1\) there exists a linear functional \(f_{\textbf{x}}\in \mathcal {B}^{*}\) with \(f_{\textbf{x}}\left( \textbf{x}\right) =\left\| f_{\textbf{x}}\right\| \). The existence of \(f_{\textbf{x}}\) is guaranteed by the Hahn-Banach-Theorem.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Villmann, T., Engelsberger, A. (2023). Multilayer Perceptrons with Banach-Like Perceptrons Based on Semi-inner Products – About Approximation Completeness. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2022. Lecture Notes in Computer Science(), vol 13588. Springer, Cham. https://doi.org/10.1007/978-3-031-23492-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23492-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23491-0

  • Online ISBN: 978-3-031-23492-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics