Elsevier

Neurocomputing

Volume 73, Issues 1–3, December 2009, Pages 468-477
Neurocomputing

A linear ridgelet network

https://doi.org/10.1016/j.neucom.2009.07.006Get rights and content

Abstract

The multiscale properties of the reception field of the human visual cortex have illuminated the research of wavelet neural network (WNN). Findings in neurophysiology indicate that in the human visual system there are specialized areas in the visual cortex that respond for particular orientations. In other words, the reception field of visual cortex has the multiresolution properties in direction, as well as in localization and scale. Enlightened by these facts, a three layer feed-forward neural network (FNN) is presented by employing ridgelet as the activation function in the hidden layer. To get rapid learning when dealing with high dimensional samples, we proposed an efficient linear learning algorithm inspired by traditional kernel smoothing method, which has low computation complexity proportional to the number and dimension of samples. At the cost of a little degradation in accuracy, the network can achieve rapid learning. Some simulation experiments about function approximation are taken, and several commonly used regression ways are considered under the same conditions to give a comparison result. The results show that the proposed linear ridgelet network can overcome the curse of dimensionality in the training of FNNs and exhibit better performance in high dimension than its counterparts, especially when some spatial inhomogeneities exist in the function.

Introduction

Multi-dimensional function approximation (MDFA) that develops models from observed samples is a fundamental problem in machine learning (ML), which can be referred to as regression, data modeling or system identification according to the involved communities [1]. Given a set of example pairs, MDFA aims to find an inner structure or mapping dm between the input and output samples. Some parametric structures, such as splines and wavelets, have become standard tools in MDFA involving input spaces with up to three dimensions [2], [3]. However, much of univariate approximation theory does not generalize well to higher dimension spaces [4]. Neural networks (NNs) provide models for a large class of natural and artificial phenomena that are difficult to handle using classical parametric techniques [5]. As one powerful nonparametric way for MDFA, feed-forward neural networks (FNNs) are by far one of the most popular architectures in NNs due to their structural flexibility, good representational capabilities, and availability of a large number of training algorithms. Apart from function approximation, FNNs have served as powerful computational tools in a diversity of applications including classification, prediction and control [6], [7]. Among the constructive FNNs, the single hidden layer FNN is the simplest in terms of both its structure and training efficiency, yet having wide applicability due to its “universal approximation property” [8], [9]. The activation function in the hidden layer of FNNs determines what kinds of functions the network is capable of approximating, and influences the network's performance remarkably. Various kinds of activation functions have been employed in different networks in order to give a good expression of the observed data, such as the hard-limiting function used in Multilayer Perceptron (MLP) [10], the sigmoid function in BP network (BPN) [11], the locally receptive field functions in radical basis function network (RBFN) [12] and the wavelet function in wavelet neural network (WNN) [13]. These activation functions can imitate the behavior of human brain neurons; however, these mimics are too oversimplified to reflect actual biological system.

The multiscale properties of the reception field of the human visual cortex have illuminated the research of WNN. Wavelet has multi-resolution characteristics in scale and position, which is very similar to the properties of visual neurons. The origin of WNN can be traced back to the work by Daugman in which Gabor wavelets were used for image classification [14]. WNN proves to be optimal in the sense that they require the smallest possible number of bits to store, for reconstructing a function within a given precision. WNN has become popular after the work by Zhang [13], Pati [15], and Szu [16]. Zhang applied wavelet networks to the problem of controlling a robot arm. Later Jiao proposed a multi-wavelet neural network model [17]. It is proved by Kreinovich that WNN is asymptotically optimal approximator for univariate functions [18].

Though WNN works well in one dimension, it cannot be extended successfully to higher dimension. One main reason is the difficulty of constructing high dimensional wavelet function. The commonly used form of multidimensional wavelets is a simple tensor product of 1-d wavelets, which fails in capturing the geometrical regularity in multidimensional data. For example, 2-d tensor product wavelets only have three directions, i.e, horizontal, vertical and diagonal directions. Making an analysis of wavelets, we found that wavelet function lacks description of directions in high dimension. Researches of neurophysiology indicate that the reception field of visual cortex has the multi-resolution property not only in scale and location, but also in direction [19]. That is, there are some neurons in brain in charge of specified directions, and the neurons can “see” and identify an object by virtue of their sensitivities to directions. As we all know, direction is an important attribute of data in high dimensional space. Therefore finding a new tool that has the same efficiency in high dimension with that of wavelet in 1-d is desirable.

As an extension of wavelet to higher dimension, ridgelet is a recently developed geometrical multi-scale analysis (GMA) tool [20], [21], [22], [23]. Compared with wavelet, ridgelet defines the direction in addition to the scale and position in its expression, which is more consistent with a real biology neuron, so it is more suitable for describing the sensitivity of neurons to directions. It proves to be a new basis to represent multi-dimensional functions in a more stable way, and can deal with linear and hyperplane singularities efficiently [20]. In paper [24], [25] we have proposed ridgelet neural network models by employing ridgelets as the activation function of the hidden neurons in a three layer FNN. In paper [24], a multiresolution ridgelet neural network with ridgelet neurons and variable structure is presented based on ridgelet frame, and in the paper [25], a ridgelet kernel learning machine is proposed based on the least square support vector machine (LS-SVM) in statistical learning theory. For defining an additional direction in the ridgelet, the network can deal with high dimensional functions more efficiently in a stable way. Moreover, ridgelet proves to be the best basis for a group of functions with certain kinds of spatial inhomogeneities, so the network can deal with wider range of functions especially those with hyperplane singularities.

It is well known that when the network is trained by examples, it is important to improve its generalization ability and at the same time to reduce its complexity [26], [27]. In the constructed neural network, there are more parameters to be learned with the increase of the dimension of input samples. To get the same approximation accuracy with that of one dimension, exponential increased number of training samples are required, which is known as the “curse of dimensionality”. In paper [24], there are fewer parameters to be determined because the parameters are in the binary ridgelet frame, and only the connected weights need to be determined. The continuous ridgelet neurons can bring out stronger approximation capability; however, the learning of parameters will be executed in higher dimensional space and thus makes the searching process time consuming and difficult. The high computational complexity in time and space of support vector machine (SVM) is well-known in machine learning, the time taken to carry out model selection in SVM could be unacceptably long for some contemporary applications. Although a least square SVM (LS-SVM) is adopted in paper [25], its long training process in high dimension is still boring.

Aiming at a fast training of the ridgelet network, in this paper a linear learning algorithm is introduced inspired by the classical kernel smoothing method, which is widely used in many scientific and engineering areas, such as regression, pattern recognition and signal and image processing [28], [29], [30], [31]. Based on the kernel density estimation technique, a novel kernel density estimation algorithm is employed for efficient construction of the ridgelet network, which is of linear complexity with the number of the training samples, and generally delivers the same level of approximation accuracy as the SVM. Moreover, genetic algorithm (GA) is used to optimize the directions of ridgelets. The default width of the regression is derived from the optimal bandwidth of the Gaussian kernel density estimation suggested in the literatures [32], [33]. As long as the directions of ridgelets are determined, the training is of linear complexity in time and space with the number of samples by a mathematical approximation, with a little degradation of precision.

The rest of this paper is organized as follows: In Section 2, we discussed some related regression methods and constructed a three-layer ridgelet neural network based on the ridgelet approximation. In Section 3, a linear learning algorithm is introduced based on kernel smoothing method. In Section 4 some simulation experiments are taken to illustrate the efficiency and superiority of the linear ridgelet network to its counterparts. Finally some conclusions are drawn in Section 5. In Appendix, a theoretical analysis of approximation property of the network is presented including the approximation capability and rate.

Section snippets

Construction of ridgelet network

Reconstructing an unknown function by a linear combination of elements coming from a dictionary is a very inspiring idea on which many regressors are based. There have established many approximants such as linear basis expansion, nonlinear transform-based superposition models and pursuit algorithm. In contrast to the transform-based estimators, adaptive approximation techniques can overcome the limitation of linear techniques and bypass the “curse of dimensionality” by adaptively selecting

Linear ridgelet network

The training of parameters in the ridgelet network is a tough task. Typically, training involves the numerical optimization of the error between the data and the actual network's performance with respect to its adjustable parameters or weights. From the above we know that for training the network in Fig. 1, the parameters to be determined includes N, w, a, b, u, totally 1+mN+N+N+(d−1)N=1+(d+m+1)N parameters. If the dimension of input samples is high and the network has a large structure, the

Simulation experiment

As described above, the linear ridgelet network can choose adequate ridgelet neurons to get the equal-spaced directional samples, and then the parameters of the network can be determined. At the expense of a little degradation of the regression precision, the computational complexity is linear with the number and dimension of the input samples. In the following some experiments are taken to investigate the performance of the proposed method.

Conclusions

In recent years, the geometrical multiscale analysis (MGA), which is more suitable for approximating high dimensional functions than wavelets, has been applied to various kinds of science and engineering communities. As one of efficient MGA tools, ridgelet extends the advantages of wavelet in locating point-like singularities to higher dimension, and proves to be the best basis for a group of functions with hyperplane singularities. Inspired by the geometric multiscale properties of the

Shuyuan Yang received the B.A. degree in electrical engineering from Xidian University, Xi’an, China, in 2000, the M.S. degree and Ph.D. degree in Circuit and System from Xidian University, Xi’an, China, in 2003 and 2005, respectively. Her main current research interests are intelligent signal and image processing.

References (39)

  • T. Hastie et al.

    The Elements of Statistical Learning: Data Mining, Inference and Prediction

    (2001)
  • M. Nørgaard et al.

    Neural networks for modelling and control of dynamic systems, ser. advanced textbooks

  • T. Chen et al.

    Universal approximation to nonlinear operators by neural networks with arbitrary activation function and its application to dynamical systems

    Neural Networks

    (1995)
  • S. Trenn, Quantitative analysis of neural networks as universal function approximators, M.S., Diplomarbeit, Fakultät...
  • W.S. McCulloch et al.

    A logical calculus of the ideas immanent in nervous activity

    Bulletin of Mathematical Biophysics

    (1943)
  • G. Cybenko

    Approximation by superpositions of a sigmoidal function

    Mathematics of Control, Signals and Systems

    (1989)
  • J. Zhang et al.

    Wavelet neural networks for function learning

    IEEE Transactions on Signal Processing

    (1995)
  • J. Daugmann

    Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression

    IEEE Transactions on Acoustics, Speech and Signal Processing

    (1988)
  • Y.C. Pati et al.

    Analysis and synthesis of feedforward neural networks using discrete affine wavelet transformations

    IEEE Transactions on Neural Networks

    (1992)
  • Cited by (0)

    Shuyuan Yang received the B.A. degree in electrical engineering from Xidian University, Xi’an, China, in 2000, the M.S. degree and Ph.D. degree in Circuit and System from Xidian University, Xi’an, China, in 2003 and 2005, respectively. Her main current research interests are intelligent signal and image processing.

    Min Wang received the B.A. degree in automatic control from Xidian University, Xi’an, China, in 2000, the M.S. degree and Ph.D. degree in Signal and Information Processing from Xidian University, Xi’an, China, in 2003 and 2005, respectively. He is currently working as a lecturer in National Key Lab of Radar Signal Processing in the Department of Electrical Engineering in Xidian University. His current interests include radar signal processing, ultra wideband radio technology and multi-sensor signal processing.

    Licheng Jiao received the B.S. degree from Shanghai Jiaotong University, Shanghai, China, in 1982 and the M.S. and Ph.D. degrees from Xi'an Jiaotong University, Xi'an, China, in 1984 and 1990, respectively. He is currently Professor and Dean of the electronic engineering school at Xidian University. His research interests include neural networks, data mining, nonlinear intelligence signal processing, and communication.

    View full text