Elsevier

Signal Processing

Volume 201, December 2022, 108708
Signal Processing

A joint learning framework for Gaussian processes regression and graph learning

https://doi.org/10.1016/j.sigpro.2022.108708Get rights and content

Highlights

  • In this paper, we propose a novel GPR model, where multi-dimensional sample inputs are viewed as signals generated over an underlying graph. A joint learning framework is developed to simultaneously estimate covariance matrix of target values, dominated by a chosen kernel function, and the underlying graph of sample inputs. In this way, more topological information of inputs could be exploited to estimate covariance matrix of outputs and thus improve prediction accuracies of the GP model obtained.

  • Two graph learning algorithms are adopted in our GPR framework. The first one is based on the classical smoothness assumption of an undirect graph of input samples. In the second algorithms, the self-representative (or self-explanatory) property of sample inputs is harnessed to infer an input graph.

  • We also develop novel numerical algorithms for the proposed models, which provide more accurate and efficient graph estimation compared to the state-of-the-arts.

Abstract

In the traditional Gaussian process regression (GPR), covariance matrix of outputs is dominated by a given kernel function, that generally depends on pairwise distance or correlation between sample inputs. Nevertheless, this kind of models hardly utilize high-order statistical properties or globally topological information among sample inputs, undermining their prediction capability. To remedy this defect, we propose in this paper a novel GPR framework combining the MLE of Gaussian processes with graph learning. In our model, sample inputs are modeled by a weighted graph, whose topology is directly inferred from sample inputs based on either the smoothness assumption or the self-representative property. Such global information can be viewed as a kind of knowledge a prior, guiding the process of learning hyper-parameters of the chosen kernel function and the construction of covariance matrix of GPR model outputs. In practice, hyper-parameters of the GPR model and adjacency matrix of the graph can be trained by the alternating optimization. Theoretical analyses regarding solutions to graph learning are also presented to reduce computational complexity. Experimental results demonstrate that the proposed framework can achieve competitive performance in terms of prediction accuracies and computational efficiency, compared to state-of-the-art GPR algorithms.

Introduction

Regression models based on Gaussian processes (GPs) are a powerful tool in various applications [1], [2]. Their objective is to reconstruct the underlying signals or functions mapping inputs to outputs. Gaussian process regression (GPR) relies on a general assumption that similar inputs likely lead to similar target values, which follow a joint Gaussian distribution. The similarity of target values is described in GP by covariance matrix, which is further dependent on a chosen kernel function. A wide range of kernel functions, such as squared exponential (SE) kernel, rational quadratic (RQ) kernel, periodic kernels (PE) [3] and spectral mixture (SM) kernel [4], have been deployed in GP models. Among them, the SE kernel is the most popular one, which describes the relationship of two target values using the Euclidean distance of their corresponding inputs. In practice, the prediction of target values can be conducted by the means of Bayesian inference.

The performance and complexity of a GPR model is generally dominated by both the chosen kernel function and the volume of the data. Nowadays, the optimization of hyper-parameters of kernel functions is still a challenging task, that involves the computation of the inverse of covariance matrix. Its computational cost is generally high, especially when dealing with some complicated kernel functions [1]. To remedy this issue, inducing point methods reduce the effective number of input data from N to KN and use such K inducing points to construct an approximate covariance matrix. Since the rank of the effective covariance matrix is smaller than the original one, inducing point methods can deal with a large volume of data. Typical examples of this class include subset of regressors (SoR) [5], fully independent training conditional (FITC) [6], partially independent training conditional (PITC) [1], and structured kernel interpolation (SKI) [7]. The motivation of SoR is to replace covariance matrix of original inputs by a low-rank counterpart composed of covariance matrix of inducing points and the one between inducing points and training data. It can also be viewed as approximating original inputs by a linear transformation of inducing points. Based on SoR, FITC and PITC have been further developed based on different conditional independence assumptions regarding inducing points. Compared to SoR, their approximate covariance matrix is closer to the original one, but requiring no extra computational burden. SKI is one of scalable approaches, also based on SoR. It places inducing points on a dense grid and their covariance matrix is of certain forms, e.g., Kronecker product [8] or diagonal-constant (Toeplitz) [9]. Furthermore, it relaxes the restriction that input points must be on a grid. The structured matrix algebra [10] has been employed to relieve computational cost of covariance matrix between induced points and inputs, making it applicable to large-scale datasets.

In the traditional GPR, hyper-parameters of kernel functions are often learn-ed by maximum likelihood estimate (MLE). But the resulting model is generally a nonconvex optimization problem even for some simple kernel functions, e.g., SE kernel. Unsuitable initial values of hyper-parameters could lead to local optimum far away from global solutions, damaging the prediction accuracy of the resulting GPR model. Another more important issue regarding the traditional GPR is that kernel functions are generally dominated by pairwise distance or correlation between sample inputs. High-order statistical properties or global topology of the whole set of inputs are not fully exploited in the current framework, which essentially undermines its modeling capability. Recently, some researchers have introduced more information a prior to guide the estimation of covariance matrix. For instance, approximate precision matrix of target values is jointly learned along with covariance matrix in Miao et al. [11]. Since the problem of estimating precision matrix is convex, their optimal solutions can be reliably achieved and used in the training of covariance matrix.

Graphs play an important role in many tasks of signal processing and machine learning, since they are able to describe both regular and irregular data. A graph is composed of vertices (or nodes) and edges. Vertices represent various entities, while edges denote not only concrete but also abstract relationships among vertices. Given sample data, various algorithms have been developed to estimate their topological structure [12]. For instance, a framework has been proposed in Egilmez et al. [13] to estimate graph Laplacians from observed data under structural constraints. It has been shown that graph Laplacian can be treated as precision matrix in maximum a posteriori parameter estimation of Gaussian–Markov random field models. This provides a potential to jointly learn the Laplacian matrix of a graph and covariance matrix of a GP. However, computational complexity of graph learning approaches developed in Egilmez et al. [13] could be high, since they can only handle a Laplacian matrix as a whole.

Topological structure of the underlying graph learned from observed data or specified by knowledge a prior can also be exploited to improve the performance of the traditional GPR. In [14], a graph Gaussian process (GGP) model has been developed for classification tasks. Outputs of a GP corresponding to different inputs are locally averaged in their neighborhood of a given rational graph to obtain latent variables, which are then employed to finally predict the class of inputs. In [15], [16], authors considered the scenario, where each sample input corresponds to a vector output, that are assumed to lie on a graph. However, in many regression tasks, only scalar target value is required to be predicted. In this situation, topological structure of the underlying graph learned from scalar target values becomes unreliable. On the other hand, since each sample input in the traditional GPR are generally multi-dimensional, topological structure of sample inputs is more informative.

In this paper, we propose a novel GPR model, where multi-dimensional sample inputs are viewed as signals generated over a weighted graph. A joint learning framework is developed to simultaneously estimate covariance matrix of target values and the underlying graph of sample inputs. In this way, more topological information of inputs could be exploited to estimate covariance matrix of target values and thus improve prediction accuracies of the GP model obtained. In addition, we also develop novel numerical approaches for the proposed models, which provide more reliable and efficient graph estimation compared to the state-of-the-arts. The paper is organized as follows. In Section 2, we first review the traditional GPR and some related work. The fundamentals of graphs and the proposed joint learning framework are introduced first in Section 3. Alternating optimization algorithms are also developed to tackle the resulting problems. Experimental results obtained from three sets of real data are presented in Section 4. Finally, Section 5 concludes the paper.

Section snippets

Traditional GPR

Let X=[x1,x2,,xN]RN×D denotes N training inputs and y=[y1,y2,,yN]RN consists of N target values, each associated with xi. For a regression problem, each yi is modeled as the output of an unknown function corrupted by additive noise, that is,yi=g(xi)+ε,where ε is supposed to follow the zero-mean isotropic Gaussian distribution, i.e., p(ε)N(0,σ2). Then, the probability density function (PDF) of y is expressed asp(y)N(μ,KN,N+σ2I),where KN,NRN×N denotes covariance matrix of target values

Joint learning framework

Modern data analysis and processing typically involves a large volume of structured data, where the structure carries critical information about the essence of data [17]. Graphs offer a useful way to describe relationships in complex datasets. They have become a powerful mathematical model and a practical tool of modern data analysis [18], [19], [20], [21]. A weighted graph G=(V,E) consists of a finite vertex set V={1,,N} and an edge subset EV×V [22]. Every edge is often associated with a

Experimental setup

Three real-world datasets are used to evaluate the prediction performance of the proposed GPR algorithms. The normalized mean square error (NMSE) defined below is adopted to measure prediction accuracy [3]NMSE=10log10(my*22y*F2).In our experiments, three representative kernel functions are employed in (3):

  • a)

    Isotropic SE kernel (SEiso in short) or radial basis function (RBF) kernel:k(xi,xj)=δf2·exp(xixj222d2).Hyper-parameters w include δf and d in the above kernel function, and standard

Conclusions

In this paper, we have proposed a novel joint learning framework, which combines the GPR with graph learning, such that topological information can be effectively used to improve prediction accuracies of GP models obtained. Two strategies have been developed for constructing graphs of sample inputs. The resulting problems can be tackled by the alternating optimization scheme. Theoretical analyses regarding optimal solutions to the underlying graph learning problems have also been presented to

CRediT authorship contribution statement

Xiaoyu Miao: Methodology. Aimin Jiang: Formal analysis. Yanping Zhu: Software. Hon Keung Kwan: Writing – original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported in part by the National Key Research and Development Program 2018AAA0100800, the National Nature Science Foundation of China under grant 61801055.

References (31)

  • J. Belda et al.

    Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs

    Signal Process.

    (2018)
  • J. Quinonero-Candela et al.

    A unifying view of sparse approximate Gaussianprocess regression

    J. Mach. Learn. Res.

    (2005)
  • C.K. Williams et al.

    Gaussian Processes for Machine Learning

    (2006)
  • A. Wilson et al.

    Gaussian process kernels for pattern discovery and extrapolation

    International Conference on Machine Learning

    (2013)
  • B.W. Silverman

    Some aspects of the spline smoothing approach to non-parametric regression curve fitting

    J. R. Stat. Soc.

    (1985)
  • E. Snelson et al.

    Sparse Gaussian processes using pseudo-inputs

    Adv. Neural Inf. Process. Syst.

    (2006)
  • A. Wilson et al.

    Kernel interpolation for scalable structured Gaussianprocesses (KISS-GP)

    International Conference on Machine Learning

    (2015)
  • Y. Saatçi

    Scalable Inference for Structured Gaussian Process Models

    (2012)
  • A.G. Wilson et al.

    Fast kernel learning for multidimensional pattern extrapolation

    NIPS

    (2014)
  • M. Yadav et al.

    Faster kernel interpolation for Gaussian processes

    International Conference on Artificial Intelligence and Statistics

    (2021)
  • X. Miao et al.

    Gaussian processes regression with joint learning of precision matrix

    2020 28th European Signal Processing Conference (EUSIPCO)

    (2021)
  • X. Dong et al.

    Learning graphs from data: a signal representation perspective

    IEEE Signal Process. Mag.

    (2019)
  • H.E. Egilmez et al.

    Graph learning from data under Laplacian and structural constraints

    IEEE J. Sel. Top. Signal Process.

    (2017)
  • Y.C. Ng, N. Colombo, R. Silva, Bayesian semi-supervised learning with graph Gaussianprocesses,...
  • A. Venkitaraman et al.

    Predicting graph signals using kernel regression where the input signal is agnostic to a graph

    IEEE Trans. Signal Inf. Process. Netw.

    (2019)
  • Cited by (3)

    View full text