Elsevier

Neural Networks

Volume 15, Issue 1, January 2002, Pages 57-67
Neural Networks

Contributed article
Space-filling curves and Kolmogorov superposition-based neural networks

https://doi.org/10.1016/S0893-6080(01)00107-1Get rights and content

Abstract

Kolmogorov superpositions and Hecht-Nielsen's neural network based on them are dimension reducing. This dimension reduction can be understood in terms of space-filling curves that characterize Kolmogorov's functions, and the subject of this paper is the construction of such a curve. We construct a space-filling curve with Lebesgue measure 1 in the unit square, [0,1]2, with approximating curves Λk, k=1,2,3,…, each with 102k rational nodal-points whose order is determined for each k by the linear order of their image-points under a nomographic function y=α1ψ(x1)+α2ψ(x2) that is the basis of a computable version of the Kolmogorov superpositions in two dimensions. The function ψ:[0,1]→[0,1] is continuous and monotonic increasing, and α1, α2 are suitable constants. The curves Λk are composed of families of disjoint closed squares of diminishing diameters and connecting joins of diminishing lengths as k→∞.

Introduction

Referring to the Nomenclature, consider the two-dimensional version of the Kolmogorov (1957) superpositions:f(x1,x2)=q=04Φq(yq),in which an arbitrary real-valued continuous function f:E2R is computed with continuous functions Φq:ER. The arguments:yq1ψ(x1+qa)+α2ψ(x2+qa)are fixed nomographic functions that are independent of f; The function ψ:EE is monotonic increasing and continuous, and α1, α2 and a are suitable constants (Sprecher, 1996, Sprecher, 1997). In a pioneering paper, Hecht-Nielsen (1987) linked the n-dimensional version of Kolmogorov superpositions to computer architecture by interpreting them as a four-layer architecture of a feedforward neural network, as represented schematically in Fig. 1. This architecture consists of two pairs of nonlinearlinear layers, the first pair constituting a dimension-dependent and otherwise fixed hidden layer, and the second pair constituting the output layer in which an arbitrary target function f is implemented. The nonlinear activation functions of the units in the hidden layer may therefore be implemented in hardware and hard-coded into the network.

The functions in Eq. (2) are the most basic nonlinear continuous functions that can be used to replace a pair of variables (x1, x2) with new variables yq, with the result that the computation of the two-variable function f(x1, x2) is carried out exclusively in terms of the five one-variable functions Φq(yq). This dimension-reduction is accomplished with the binary operation ‘+’ in Eq. (2), and correspondingly in the hidden layer of the network. Here we examine the underlying topological properties of these functions that enable this dimension reduction, and we begin with the general statement that there exist continuous curves passing through all points of a square or a subset thereof, thereby inducing a linear order on an infinite dense set of points (x1, x2). The question of existence of such curves followed Cantor's 1878 work that demonstrated, among other things, that there is a one–one relationship between the points of a square and the points of an interval. The first space-filling curve, as these curves are called, was discovered by Peano in 1890, and mathematicians since have followed his discovery with a detailed study of their properties as well as the construction of a rich variety of space-filling curves (Sagan, 1991). Each of the functions in Eq. (2) determines a space-filling curve with the specific properties that are required to obtain Eq. (1). The subject of this paper is the construction of a space-filling curve characterizing the functions (Eq. 2), and in turn illuminating the algorithm leading to the Kolmogorov superpositions and Hecht-Nielsen's neural network.

Following Kolmogorov's original strategy, Eq. (1) is established through the construction of suitable families of disjoint closed squares of increasing refinement for each value of q, each covering the unit square E2 except for narrow horizontal and vertical strips. Each function (Eq. 2) is such that it maps its families of squares one–one onto families of disjoint closed intervals.1 The diameters of the squares and the widths of the gaps separating them tend asymptotically to zero, and so do the lengths of the corresponding intervals and their separating gaps. When each family of squares is appropriately joined pair wise at each stage, they form curves (chains) that converge to a space-filling curve. We are able to refer to such chains as curves because a square can be mapped onto its diagonal. The constructions in this paper are for the case q=0 in Eq. (2).

Section 1 is devoted to a rehearsal of those properties of the function y0 that are necessary for the analytic construction of a space-filling curve Λ in Section 2. This construction culminates in a lemma and four theorems—all proved in Section 4. Section 3 contains a geometric construction of approximating curves Λk. MatLab was used to generate computer graphs of Λ0 and Λ1. These graphs clearly reveal the non-linearity of the function y0. An examination of the Kolmogorov (1957) original functions yr=ψ1,r(x1)+ψ2,r(x2), r=0,1,…,4 suggests that these also generate similar space filling curves, but this needs yet to be verified.

Section snippets

The nomographic function ξ(x)

Our starting point is the nomographic function:ξ(x)=α1ψ(x1)+α2ψ(x2)with ψ as defined in the Appendix, having rational values at the points:dk=r=1kir10−r,ir=0,1,2,…9 and k=1,2,3,…. The normalized constants α1, α2, also defined in the Appendix, are rationally independent, so that if we set zk=ξ(dk) and z′k=ξ(dk), then for fixed k, zk=zk if and only if dk=dk. Consequently, dk=ξ−1(zk) is well defined for fixed k, and listing the image point zk=ξ(dk) in increasing order: 0=zk1<zk2<zk3<…<zk102k

The curve Λ

We begin with the observations that every infinite sequence of nested intervals Tk1(zk1)⊃Tk2(zk2)⊃… determines a unique point:z=r=1Tkr(zkr)as well as a unique sequence of squares Sk1(dk1), Sk2(dk2),…. This sequence may contain a nested sequence Sks(dks)⊃Sks+1(dks+1)⊃… defining a unique image point of z for some integer s≥1: We shall show that the sequence dk1, dk2, dk3,… converges also when no such nested sequence exists. When zE is not the infinite intersection of intervals Tkr(zkr) then

Approximating curves

We begin with:

Definition 1

Let a point z∈E be a given. The curve ξ−1(z)={xE2:ξ(x)=z} is called a level-curve of the function ξ.

Because the function ψ is continuous and monotonic increasing and the constants α1 and α2 are positive, ξ−1(z) is a continuous monotonic decreasing curve for fixed z, with initial and end points on the boundary of E2. For fixed k and ω=1,2,3,… we now connect each square Sk(dkω) to its immediate successor Sk(dkω+1) with a join to form a continuous chain (approximating curve)

Proofs

Proof of Lemma

1: Let k≥2 be fixed and let an integer 1≤sk−1 be given. Then:

|ψ(d2,k)−(d′2,k)|≥1102s−1

according to the Appendix, and we consider the inequality:|zs,k−z′s,k|≥1α+1[ψ(d1,k)−ψ(d′1,k)]+α102s−1=1α+1A1,k+α102s−1.

We seek a lower bound on |zs,kzs,k| as a function of s, and toward this end we consider the formula:A1,k=r=1s(ı̃1,rı̃1,r)2mr102r−mr−1+r=s+1k(ı̃1,rı̃1,r)2mr102r−mr−1=B1,s+Bs+1,kobtained from Eq. (A1). Using the lower positive bound:|B1,s|≥1102s−1,the inequality:B1,s+α102s−11102s−1α102s−1>1−α102s−1tells us that the right side of Eq. (14) cannot attain a minimum value

References (6)

There are more references available in the full text version of this article.

Cited by (26)

View all citing articles on Scopus
View full text