Contributed articleSpace-filling curves and Kolmogorov superposition-based neural networks
Introduction
Referring to the Nomenclature, consider the two-dimensional version of the Kolmogorov (1957) superpositions:in which an arbitrary real-valued continuous function f:2→ is computed with continuous functions Φq:→. The arguments:are fixed nomographic functions that are independent of f; The function ψ:→ is monotonic increasing and continuous, and α1, α2 and a are suitable constants (Sprecher, 1996, Sprecher, 1997). In a pioneering paper, Hecht-Nielsen (1987) linked the n-dimensional version of Kolmogorov superpositions to computer architecture by interpreting them as a four-layer architecture of a feedforward neural network, as represented schematically in Fig. 1. This architecture consists of two pairs of nonlinear→linear layers, the first pair constituting a dimension-dependent and otherwise fixed hidden layer, and the second pair constituting the output layer in which an arbitrary target function f is implemented. The nonlinear activation functions of the units in the hidden layer may therefore be implemented in hardware and hard-coded into the network.
The functions in Eq. (2) are the most basic nonlinear continuous functions that can be used to replace a pair of variables (x1, x2) with new variables yq, with the result that the computation of the two-variable function f(x1, x2) is carried out exclusively in terms of the five one-variable functions Φq(yq). This dimension-reduction is accomplished with the binary operation ‘+’ in Eq. (2), and correspondingly in the hidden layer of the network. Here we examine the underlying topological properties of these functions that enable this dimension reduction, and we begin with the general statement that there exist continuous curves passing through all points of a square or a subset thereof, thereby inducing a linear order on an infinite dense set of points (x1, x2). The question of existence of such curves followed Cantor's 1878 work that demonstrated, among other things, that there is a one–one relationship between the points of a square and the points of an interval. The first space-filling curve, as these curves are called, was discovered by Peano in 1890, and mathematicians since have followed his discovery with a detailed study of their properties as well as the construction of a rich variety of space-filling curves (Sagan, 1991). Each of the functions in Eq. (2) determines a space-filling curve with the specific properties that are required to obtain Eq. (1). The subject of this paper is the construction of a space-filling curve characterizing the functions (Eq. 2), and in turn illuminating the algorithm leading to the Kolmogorov superpositions and Hecht-Nielsen's neural network.
Following Kolmogorov's original strategy, Eq. (1) is established through the construction of suitable families of disjoint closed squares of increasing refinement for each value of q, each covering the unit square 2 except for narrow horizontal and vertical strips. Each function (Eq. 2) is such that it maps its families of squares one–one onto families of disjoint closed intervals.1 The diameters of the squares and the widths of the gaps separating them tend asymptotically to zero, and so do the lengths of the corresponding intervals and their separating gaps. When each family of squares is appropriately joined pair wise at each stage, they form curves (chains) that converge to a space-filling curve. We are able to refer to such chains as curves because a square can be mapped onto its diagonal. The constructions in this paper are for the case q=0 in Eq. (2).
Section 1 is devoted to a rehearsal of those properties of the function y0 that are necessary for the analytic construction of a space-filling curve Λ in Section 2. This construction culminates in a lemma and four theorems—all proved in Section 4. Section 3 contains a geometric construction of approximating curves Λk. MatLab was used to generate computer graphs of Λ0 and Λ1. These graphs clearly reveal the non-linearity of the function y0. An examination of the Kolmogorov (1957) original functions yr=ψ1,r(x1)+ψ2,r(x2), r=0,1,…,4 suggests that these also generate similar space filling curves, but this needs yet to be verified.
Section snippets
The nomographic function ξ(x)
Our starting point is the nomographic function:with ψ as defined in the Appendix, having rational values at the points:ir=0,1,2,…9 and k=1,2,3,…. The normalized constants α1, α2, also defined in the Appendix, are rationally independent, so that if we set zk=ξ(dk) and , then for fixed k, zk=z′k if and only if dk=d′k. Consequently, dk=ξ−1(zk) is well defined for fixed k, and listing the image point zk=ξ(dk) in increasing order: 0=zk1<zk2<zk3<…<zk102k
The curve Λ
We begin with the observations that every infinite sequence of nested intervals Tk1(zk1)⊃Tk2(zk2)⊃… determines a unique point:as well as a unique sequence of squares Sk1(dk1), Sk2(dk2),…. This sequence may contain a nested sequence Sks(dks)⊃Sks+1(dks+1)⊃… defining a unique image point of z for some integer s≥1: We shall show that the sequence dk1, dk2, dk3,… converges also when no such nested sequence exists. When z∈ is not the infinite intersection of intervals Tkr(zkr) then
Approximating curves
We begin with: Definition 1 Let a point be a given. The curve is called a level-curve of the function ξ.
Because the function ψ is continuous and monotonic increasing and the constants α1 and α2 are positive, ξ−1(z) is a continuous monotonic decreasing curve for fixed z, with initial and end points on the boundary of 2. For fixed k and ω=1,2,3,… we now connect each square to its immediate successor with a join to form a continuous chain (approximating curve)
ProofsProof of Lemma
1: Let k≥2 be fixed and let an integer 1≤s≤k−1 be given. Then:
Proof of Lemma
1: Let k≥2 be fixed and let an integer 1≤s≤k−1 be given. Then:
according to the Appendix, and we consider the inequality:
We seek a lower bound on |zs,k−z′s,k| as a function of s, and toward this end we consider the formula:obtained from Eq. (A1). Using the lower positive bound:the inequality:tells us that the right side of Eq. (14) cannot attain a minimum value
References (6)
A numerical implementation of Kolmogorov's superpositions
Neural Networks
(1996)A numerical implementation of Kolmogorov's superpositions II
Neural Networks
(1997)
Cited by (26)
The Kolmogorov–Arnold representation theorem revisited
2021, Neural NetworksNeural Network with Optimal Neuron Activation Functions Based on Additive Gaussian Process Regression
2023, Journal of Physical Chemistry AComputationally efficient physics approximating neural networks for highly nonlinear maps
2022, ACM International Conference Proceeding Series