1 Introduction

In spatial data analysis a common problem is to find the value of a target variable, for example, temperature or ozone concentration, over a two-dimensional domain. Usually this domain has an irregular shape and the variable is observed at discrete locations/times. There are mainly two approaches to solve this problem. One approach treats the value of a target variable at each location as a random variable and uses the covariance function between these random variables or a variogram to represent the correlation; another approach uses a deterministic smooth surface function to describe the variations and connections among values at different locations. This work takes the latter approach.

To be more specific, we are interested in estimating a smooth function \(f(x,y)\) over some bounded domain \(\Omega \subseteq R^2\) given observations \(\{z_i\}_{i=1}^n\) at a collection of discrete points \(\{v_i=(x_i,y_i)\}_{i=1}^n\) in the domain. We assume that the observations \(z_i\) at locations \((x_i, y_i)\in R^2\) satisfy

$$\begin{aligned} z_i=f(x_i, y_i)+\epsilon _i, \qquad i=1,2,\ldots ,n, \end{aligned}$$

where \(\epsilon _i\)’s are zero-mean random variables. While studying this two-dimensional smoothing problem, we pay special attention to the following two practical issues. 1. The data need not be evenly distributed; observations can be dense at some locations while sparse at others. 2. The domain for the data can take non-rectangular shapes, it may have holes inside but it should be nicely connected.

In this work we introduce bivariate splines on triangulations to handle irregular domains and propose to extend the idea of univariate penalized splines (Eilers and Marx 1996) to the two-dimensional case. The bivariate splines we consider have several advantages. First, they are rich enough to provide good approximations of smooth functions over complicated domains. Second, they can be easily incorporated into many widely used functional data models tailored for different data structure. Third, the computational cost for spline evaluation and parameter estimation are manageable. The bivariate splines are well-developed in applied mathematics (Lai and Schumaker 2007) but are not well-known in the statistics community.

Although not as extensively studied as univariate smoothing, two-dimensional smoothing over irregular domains has been the subject of research for many authors. Using roughness penalties (Green and Silverman 1994) has been the dominating approach. To deal with irregular domains, Wang and Ranalli (2007) applied low-rank thin-plate splines defined as functions of the geodesic distance instead of the Euclidean distance, while Eilers (2006) employed the Schwarz–Christoffel transform to convert the complex domains to regular domains. Other authors dealt with the irregular domains directly. Ramsay (2002) formulated a penalized least squares problem with a Laplacian penalty and transformed the problem to that of solving a system of partial differential equations (PDE), which in turn can be solved nicely using the finite element method. In order to make the method work, however, Ramsay (2002) made the rather strong assumption that the gradient of the estimated function is zero along normal directions to the boundary at all boundary points. To remove this undesirable assumption, Wood et al. (2008) developed the soap film smoother and showed empirically its superior performance over Ramsay’s method. The penalized bivariate spline method we consider in this paper is related to the triogram model by Hansen et al. (1998), where linear splines on triangulations are used to model bivariate functions and triangulations are built in a data adaptive fashion. Different from their work, however, we follow the idea of penalized splines and use a pre-specified triangulation and employ roughness penalties to regularize the spline fit. In order to define roughness penalties, we use bivariate splines that allow definitions of derivatives. Our penalized bivariate spline method is computationally simple, but does not have the spatial adaptivity of the triogram model of Hansen et al. (1998). We will show, using a simulation study, that our method is competitive to existing methods such as the soap film smoother and the thin-plate regression spline. Another related work is Koenker and Mizera (2004), where the total variation penalty was used and turned out to be convenient for quantile regressions.

The rest of the paper is organized as follows. Section 2 reviews the background about bivariate splines. Section 3 presents the details of the penalized bivariate spline method. Section 4 discusses some implementation details. Section 5 contains numerical results. Section 5.1 presents simulation results comparing our method with the soap film smoother of Wood et al. (2008) and the thin-plate regression spline of Wood (2003). Section 5.2 illustrates our method using a real dataset.

2 Bivariate splines on triangulations

This section provides a review of the background of bivariate splines on triangulations, following the excellent book of Lai and Schumaker (2007). Section 2.1 introduces Barycentric coordinates and Bernstein basis polynomials with respect to a triangle. Section 2.2 gives formulas for computing the directional derivatives and introducing smoothness across edges of two adjacent triangles. Section 2.3 discusses construction of bivariate splines on a triangulation.

2.1 Barycentric coordinates and Bernstein basis polynomials

Given a non-degenerate triangle \(A\) with vertices counter-clockwise numbered as \(v_1,v_2,v_3\), any point \(v\in R^2\) can be written as

$$\begin{aligned} v=b_1v_1+b_2v_2+b_3v_3,\quad \hbox {with}\,\, b_1+b_2+b_3=1. \end{aligned}$$
(1)

The coefficients \((b_1,b_2,b_3)\) are called the barycentric coordinates of point \(v\) with respect to the triangle \(A\), denoted as \(b_v=(b_1,b_2,b_3)\). Here the constraint \(b_1+b_2+b_3=1\) guarantees the unique representation of point \(v\). Although all points in \(R^2\) could be represented as (1) where \(b_i\)’s may take negative values, we only consider the points inside or on the edges of the triangle \(A\) where \(b_i\)’s are all nonnegative.

The constraint \(b_1+b_2+b_3=1\) on barycentric coordinates makes the coordinates uniquely defined. Other constraints may be used for identifiability, but we adopt this one because it gives barycentric coordinates an interesting geometric interpretation. When the point \(v\) is located inside or on the edges of the triangle \(T\), we connect the point \(v\) with \(v_1, v_2\) and \(v_3\) to generate three triangles \(\{A_1,A_2,A_3\}\) as shown in Fig. 1, then the barycentric coordinates are

$$\begin{aligned} b_i=\frac{\hbox {Area of} \,A_i}{\hbox {Area of}\, A},\qquad i=1,2,3. \end{aligned}$$
(2)

One important property of the Barycentric coordinates is that they are invariant to linear transformation of Cartesian coordinates. Since bivariate splines and all model equations are built subsequently on barycentric coordinates, they are also invariant to linear transformations.

Fig. 1
figure 1

Illustration of barycentric coordinates on a triangle

Given a triangle \(A\) and a point \(v\in A\) with barycentric coordinates \(b_v=(b_1,b_2,b_3)\), for a positive integer \(d\) and nonnegative integers \(i, j, k\) summing to \(d\), Bernstein basis polynomials of degree \(d\) with respect to the triangle \(A\) are defined as

$$\begin{aligned} B_{d;ijk}(v):= \frac{d!}{i!j!k!} b_1^ib_2^jb_3^k, \qquad b_v=(b_1,b_2,b_3),\, b_1+b_2+b_3=1,\, v\in A. \end{aligned}$$
(3)

Note that the barycentric coordinates \(b_1, b_2, b_3\) are all linear functions of the Cartesian coordinates. It follows that each \(B_{d;ijk}(v)\) is a polynomial with degree \(d\) in the Cartesian coordinates. Let \(\mathbb P _d(A)\) be the space of polynomials defined on the triangle \(A\) with degree \(d\), then \(B_{d;ijk}\in \mathbb P _d(A)\). The set of Bernstein basis polynomials

$$\begin{aligned} \varvec{B}_{d,A}:=\{B_{d;ijk},\, i,\,j,\,k\ge 0,\, i+j+k =d\} \end{aligned}$$
(4)

forms a basis of the space \(\mathbb P _d(A)\) and satisfies

  1. 1)

    \(\sum \limits _{i+j+k=d}B_{d;ijk}(v)=1,\,\, \hbox {for all}\, v\in A\),

  2. 2)

    \(0\le B_{d;ijk}(v) \le 1,\,\, \hbox {for all}\,v \in A\),

  3. 3)

    \(B_{d;ijk}\) has a unique maximum at the point \(\xi _{ijk}=(iv_1+jv_2+kv_3)/d\in A\).

(See Section 2.3 of Lai and Schumaker 2007.) Figure 2 gives an example of Bernstein basis polynomials for \(d=2\).

Fig. 2
figure 2

Contour plots of the complete collection of six Bernstein polynomials when the domain is a triangle and \(d=2\). The function values vary between 0 and 1

Since \(\varvec{B}_{d,A}\) is a basis for \(\mathbb P _d(A)\), for any function \(s \in \mathbb P _d(A)\), there exist coefficients \(\{c_{ijk}\}\) such that

$$\begin{aligned} s(v)=\sum \limits _{i+j+k=d} c_{ijk}B_{d;ijk}(v). \end{aligned}$$
(5)

This representation is usually called the B-form of \(s\) relative to the triangle \(A\). For computer implementation, we need an ordering of the coefficients in (5). In this paper we use the lexicographical order, that is, \(c_{\nu \mu \kappa }\) comes before \(c_{ijk}\) provided \(\nu > i\), or if \(\nu =i\), then \(\mu >j\), or if \(\nu =i\) and \(\mu =j\), then \(\kappa >k\). Using this ordering, (5) can be written in vector form

$$\begin{aligned} s(v)=\varvec{B}_{d,A}^T(v)\, \mathbf{c}, \end{aligned}$$
(6)

where

$$\begin{aligned} \varvec{B}_{d,A}(v)\!=\!(B_{d;d,0,0}(v), B_{d;d-1,1,0}(v),B_{d;d-1,0,1}(v), B_{d;d-2,2,0}, \ldots , B_{d;0,0,d}(v))^T \nonumber \\ \end{aligned}$$
(7)

and

$$\begin{aligned} \mathbf{c}=\{c_{d,0,0}, c_{d-1,1,0},c_{d-1,0,1}, c_{d-2,2,0},\ldots , c_{0,0,d}\}^T. \end{aligned}$$
(8)

In (8), \(c_{ijk}\) corresponds to the \(l\)th element of vector \(\mathbf{c}\) where

$$\begin{aligned} l=\sum \limits _{m=0}^{d-i} (m+1-j). \end{aligned}$$
(9)

Note that choosing different ordering of the elements of \(\mathbf{c}\) will not affect the results of evaluation of the polynomial.

Barycentric coordinates and Bernstein basis polynomials are convenient for deriving conditions for continuously connecting polynomials defined on adjacent triangles. Assume that we have two triangles \(A_1:={<}v_1,v_2,v_3{>}\) and \(A_2:={<}v_4,v_3,v_2{>}\) sharing a common edge \(e={<}v_2,v_3{>}\) with the Bernstein basis polynomials \(\{B_{d;ijk}^{(1)}\}_{i+j+k=d}\) defined on \(A_1\) and \(\{B_{d;ijk}^{(2)}\}_{i+j+k=d}\) on \(A_2\) using the corresponding barycentric coordinates. See Fig. 3 for an illustration. Consider degree-\(d\) polynomials \(p_1(v)\) defined on \(A_1\) and \(p_2(\tilde{v})\) on \(A_2\), written in the B-form

$$\begin{aligned} p_1(v)=\sum \limits _{i+j+k=d}c_{ijk}^{(1)}B_{d;ijk}^{(1)}(v) \quad \text {and}\quad p_2(\tilde{v})=\sum \limits _{i+j+k=d}c_{ijk}^{(2)}B_{d;ijk}^{(2)}(\tilde{v}). \end{aligned}$$
(10)

Assume the point \(v\) is on the common edge \(e\). The barycentric coordinates \((b_1, b_2, b_3)\) of \(v \in e\) with respect to \(A_1\) and \(A_2\) are, respectively \((0,b_2,1-b_2)\) and \((0,1-b_2,b_2)\). Therefore polynomials \(p_1(v)\) and \(p_2(\tilde{v})\) are reduced to univariate functions

$$\begin{aligned} p_1(v)=\sum \limits _{j+k=d} c^{(1)}_{0jk}\frac{d!}{j!k!} b_2^j(1-b_2)^k \end{aligned}$$

and

$$\begin{aligned} p_2(\tilde{v})=\sum \limits _{j+k=d} c^{(2)}_{0jk}\frac{d!}{j!k!} (1-b_2)^j(b_2)^k. \end{aligned}$$

In this case, \(p_1\) and \(p_2\) join continuously on edge \(e\) if and only if \(c^{(1)}_{0jk}=c^{(2)}_{0kj} \) for \(j,k \ge 0\) and \(j+k =d\).

Fig. 3
figure 3

First example of two triangles sharing an edge

2.2 Directional derivatives and smoothness

To facilitate introduction of smoothness restrictions across edges of two triangles when we define bivariate splines, we give the expressions of the directional derivatives of Bernstein basis polynomials. Directional derivative of a multivariate smooth function \(f\) at point \(v\) with respect to direction \(w\) is defined as

$$\begin{aligned} D_w f(v):= \frac{\partial }{\partial t} f(v+tw)\bigg \vert _{t=0} =\mathop {lim}_{t \rightarrow 0} \frac{f(v+tw)-f(v)}{t}. \end{aligned}$$
(11)

Higher order derivatives can be defined recursively.

Assume that the direction \(w= u_1-u_2\) where \(u_1\) and \(u_2\) are two points with barycentric coordinates \((\alpha _1,\alpha _2,\alpha _3)\) and \((\beta _1,\beta _2,\beta _3)\) with respect to the triangle \(A\). Then \(w\) is uniquely described by the triple \((a_1,a_2,a_3)\) with \(a_i = \alpha _i-\beta _i\) and \(a_1+a_2+a_3 =0\). From the definition and direct calculation, the directional derivative of the Bernstein basis polynomial \(B_{d;ijk}\) has the expression

$$\begin{aligned} D_w B_{d;ijk}(v) = d\,[a_1 B_{d-1;(i-1)jk}(v)+a_2 B_{d-1;i(j-1)k}(v)+a_3 B_{d-1;ij(k-1)}(v)].\nonumber \\ \end{aligned}$$
(12)

Thus, for \(s \in P_d(A)\) expressed in the B-form (5), the directional derivative has the expression

$$\begin{aligned} D_w s(v)&= \sum \limits _{i+j+k=d}c_{ijk} D_w B_{d;ijk}(v) \\&= \sum \limits _{i+j+k=d}\!\!\! a_1 c_{ijk} d \,[a_1 B_{d-1;(i\!-\!1)jk}(v)\!+\!a_2 B_{d-1;i(j\!-\!1)k}(v) \!+\!a_3 B_{d-1;ij(k\!-\!1)}(v)]\\&\qquad \quad \\&= d \sum \limits _{i+j+k=d} c_{ijk}^{(1)}(a) B_{d-1;ijk}(v), \end{aligned}$$

where \(c^{(1)}_{ijk}(a) =a_1c_{i+1,j,k}+a_2c_{i,j+1,k}+a_3c_{i,j,k+1}\). In general, define \(c^{(0)}_{ijk}(w)=c_{ijk}\) for \(w=(a_1, a_2, a_2)\), and recursively

$$\begin{aligned} c_{ijk}^{(m)} (a):= a_1 c_{i+1,j,k}^{(m-1)}+a_2 c_{i,j+1,k}^{(m-1)}+a_3 c_{i,j,k+1}^{(m-1)},\qquad \text {for } \,m=1,\ldots ,d. \end{aligned}$$
(13)

Then, the \(m\)th order directional derivative has the expression

$$\begin{aligned} D_w^m s(v) = \frac{d !}{(d-m)!}\sum \limits _{i+j+k=d-m} c_{ijk}^{(m)}(w)B_{d-m;ijk}(v). \end{aligned}$$
(14)

(See Theorem 2.15 of Lai and Schumaker 2007.)

Consider two triangles \(A_1:={<}v_1,v_2,v_3{>}\) and \(A_2:={<}v_4,v_3,v_2{>}\) sharing a common edge \(e={<}v_2,v_3{>}\) as depicted in Fig. 3 and two degree-\(d\) polynomials \(p_1\) and \(p_2\) defined separately on \(A_1\) and \(A_2\). For any direction \(w\), let \(D_w^l p(v)\) denote the \(l\)th order derivative in the direction \(w\) at the point \(v\). We say that \(p_1\) and \(p_2\) connect smoothly on the common edge \(e\) with order \(r\) if their derivatives up to order \(r\) in all directions agree at every point on \(e\), that is,

$$\begin{aligned} D^l_w p_1(v)=D^l_w p_2(v),\qquad \text {all } v\in e\,\, \text {and} \;l=0,\ldots ,r, \end{aligned}$$
(15)

and for all directions \(w\). The conditions for the smooth connection of two polynomials can be expressed in terms of the coefficients of their B-form representations. Assume that \(p_1\) and \(p_2\) have the B-form representations given in (10). Then, \(p_1\) and \(p_2\) are smoothly connected on edge \(e\) with order \(r\) if and only if

$$\begin{aligned} c^{(2)}_{ljk}=\sum \limits _{\nu +\mu +\kappa =l}c^{(1)}_{\nu ,k+\mu ,j+\kappa } B^{(1)}_{l;\nu \mu \kappa }(v_4),\qquad j+k=d-l,\,\,l=0,\ldots ,r, \end{aligned}$$
(16)

where \(v_4\) is the vertex of \(A_2\) that is not on \(e\), and \(c^{(1)}_{ljk}\) and \(c^{(2)}_{ljk}\) are coefficients in the B-form expressions (10). (See Theorem 2.28 of Lai and Schumaker 2007). The conditions for continuously connecting two polynomials as discussed at the end of the previous subsection is a special case of (16) when \(r=0\).

2.3 Bivariate splines on a triangulation

Consider a collection of \(K\) triangles \(\Delta :=\{A_1,\ldots , A_K\}\) that covers the domain \(\Omega \). This collection \(\Delta \) is called a triangulation of the domain, provided that if a pair of triangles in \(\Delta \) intersect, then their intersection is either a common vertex or a common edge. Let \(d\) be a positive integer. Given a triangulation \(\Delta \), define the space of degree-\(d\) piecewise polynomials on \(\Delta \) to be the set of functions that are degree-\(d\) polynomials when restricted to each triangle in \(\Delta \), that is,

$$\begin{aligned} \mathbb S _d(\Delta )=\{s: s|_{A_k}\in \mathbb P _d(A_k), k=1, \ldots , K\}, \end{aligned}$$
(17)

where \(s|_{A_k}\) denotes the restriction of \(s\) on the triangle \(A_k\). Since piecewise polynomials can be discontinuous across shared edges of adjacent triangles, they are in general not ideal objects to use for smoothing noisy data. We thus consider a subset of piecewise polynomials that satisfy global smoothness restrictions. For a nonnegative integer \(r\), let \(\mathbb C ^r(\Delta )\) denote the collection of \(r\)th differentiable function on \(\Delta \). Define the \(r\)th differentiable spline space \(\mathbb S _d^r(\Delta )\) as

$$\begin{aligned} \mathbb S _d^r(\Delta ) = \mathbb S _d(\Delta ) \cap \mathbb C ^r(\Delta ) = \{s\in \mathbb C ^r(\Delta ): s|_{A_i}\in \mathbb P _d(A_k), k=1, \ldots , K\}. \end{aligned}$$
(18)

Now we discuss how to represent bivariate splines using a basis expansion. On each triangle \(A_k\in \Delta \), let \(\varvec{B}_{d,k}\) be the set of degree-\(d\) bivariate Bernstein basis polynomials. Note the slight abuse of notation here. The subscript \(k\) in \(\varvec{B}_{d,k}\) indicates that this collection of Bernstein basis polynomials corresponds to the triangle \(A_k\) and therefore it is different from the meaning of \(k\) in (4). Since the set of polynomials \(\varvec{B}_{d,k}\) is a basis of \(\mathbb P _d(A_k),\, k=1, \ldots , K\), the union of these sets, \(\{\varvec{B}_{d,k}\}_{i=k}^K\), forms a basis of \(\mathbb S _d(\Delta )\). Thus for any \(s\in \mathbb S _d\), there exists a vector \(\mathbf{c}_{k}\) as in (8) such that

$$\begin{aligned} s|_{A_i}= \varvec{B}_{d,k}^T\mathbf{c}_{k}, \qquad k=1, \ldots , K, \end{aligned}$$

and

$$\begin{aligned} s=\varvec{B}_d^T\mathbf{c}, \end{aligned}$$

where \(\varvec{B}_d = (\varvec{B}^T_{d,1}, \varvec{B}^T_{d,2},\ldots , \varvec{B}^T_{d,K})^T\) and \(\mathbf{c}=(\mathbf{c}_1^T, \mathbf{c}_2^T,\ldots , \mathbf{c}_K^T)^T\).

It is well-known that B-splines provide easy-to-calculate, locally supported basis functions for representing univariate splines. However, construction of locally supported basis functions for bivariate splines is much more complicated and can only be done in a case-by-case basis. Thus we adopt a different strategy in this paper. We use the Bernstein basis polynomials to represent bivariate splines and impose constraints on the coefficients of the basis expansion to meet the smoothness requirement. Specifically, we repeatedly apply (16) for all shared edges in the triangulation to obtain a matrix \(\mathbf{H}\) such that \(\mathbf{H}\mathbf{c}= 0\). This constraint matrix \(\mathbf{H}\) depends on \(d,\, r\) and the structure of the triangulation.

3 Penalized bivariate splines on triangulations

Section 3.1 presents the penalized least squares problem for smoothing noisy data and its solution strategy. Section 3.2 discusses the selection of penalty parameters.

3.1 Penalized least squares

The roughness penalty approach of smoothing noisy data (Green and Silverman 1994) minimizes the following objective function

$$\begin{aligned} \sum _{i=1}^n \{z_i-f(x_i,y_i)\}^2+\lambda \, \mathsf{Pen}(f), \end{aligned}$$
(19)

where \(\mathsf{Pen}(f)\) is a roughness penalty which is small when the function is smooth and \(\lambda >0\) is a penalty parameter. A commonly used penalty for surface smoothing is the thin-plate spline penalty (Duchon 1977; Green and Silverman 1994)

$$\begin{aligned} \mathsf{Pen}_{1}(f) = \int \limits _\Omega (f_{xx}^2+2f_{xy}^2+f_{yy}^2) \,dxdy, \end{aligned}$$

where \(f_{xx}, f_{xy}, f_{yy}\) are second-order partial derivatives. Another penalty used in the literature is the squared Laplacian penalty (Ramsay 2002)

$$\begin{aligned} \mathsf{Pen}_{2}(f) = \int \limits _\Omega (f_{xx}+f_{yy})^2 \,dxdy. \end{aligned}$$

Both of these penalties are invariant to rotation and translation of the variables. This is desirable since the coordinate system used to describe the bivariate smoothing is usually arbitrary and we want the minimizer of (19) not to depend on the choice of the coordinate system. Note that these penalties require second-order partial derivatives and thus the degree of bivariate splines should be at least two. Following the common practice of univariate smoothing, we use cubic splines in this paper.

The two roughness penalties differ in how a candidate function is penalized: there is no mixed second-order derivative term in \(\mathsf{Pen}_{2}\) and the sum of the second-order derivative terms is squared in \(\mathsf{Pen}_{2}\) rather than the terms being separately squared in \(\mathsf{Pen}_{1}\). While only linear functions are not penalized by \(\mathsf{Pen}_{1}\), the space of functions not penalized by \(\mathsf{Pen}_{2}\) is infinite dimensional, since the second-order derivatives with respect to \(x\) and \(y\) are allowed to be traded off against each other. Note that the form of \(\mathsf{Pen}_{2}\) is crucial in Ramsay’s reformulation of the data smoothing problem as solving a system of partial differential equations and then using the finite element method. Both penalties can be used in our penalized spline method but systematic comparison of two penalties is beyond the scope of the paper.

For univariate smoothing problem, the penalized spline method (Eilers and Marx 1996) solves the penalized squares problem in a finite-dimensional space of spline functions. Extending this approach to the bivariate case, we solve the minimization problem (19) but restricting the solution in a finite-dimensional space of bivariate splines. In particular, we minimize the objective function in \(\mathbb S _d^r(\Delta )\) for a triangulation \(\Delta \) of the domain \(\Omega \). As we mentioned in Sect. 2.3, we use Bernstein basis polynomials to represent the bivariate splines and introduce the constraint matrix \(\mathbf{H}\) on the coefficients to enforce smoothness across shared edges of triangles. Write the target bivariate function as \(f(x,y) = \varvec{B}_d^T(x,y)\, \mathbf{c}\) and the vector of coefficients satisfies the constraint \(\mathbf{H}\mathbf{c}=0\). Let \(\mathbf{z}= (z_1, \ldots , z_n)^T\) be the vector of \(n\) observations of the response variable. Denote \(\mathbf{B}= (\varvec{B}_d(x_1,y_1), \ldots , \varvec{B}_d(x_n, y_n))^T\). Then the optimization problem (19) reduces to

$$\begin{aligned} \mathop {\text {min}}_\mathbf{c}\{\Vert \mathbf{z}-\mathbf{B}\mathbf{c}\Vert ^2 + \lambda \, \mathbf{c}^T\mathbf{P}\mathbf{c}\}\qquad \text {subject to} \,\, \mathbf{H}\mathbf{c}=0, \end{aligned}$$
(20)

where \(\mathbf{P}\) is the penalty matrix such that \(\mathsf{Pen }(f) = \mathsf{Pen }(\varvec{B}_d^T\mathbf{c}) = \mathbf{c}^T\mathbf{P}\mathbf{c}\).

To solve the constrained minimization problem (20), we first remove the constraint via QR decomposition of the transpose of matrix \(\mathbf{H}\) and convert the problem to a conventional penalized regression problem without any restriction. More specifically, we assume

$$\begin{aligned} \mathbf{H}^T=\mathbf{Q}\mathbf{R}=[\mathbf{Q}_1\, \mathbf{Q}_2] \begin{bmatrix}\mathbf{R}_1\\\mathbf{R}_2\\ \end{bmatrix}, \end{aligned}$$
(21)

where \(\mathbf{Q}\) is an orthogonal matrix and \(\mathbf{R}\) is an upper triangle matrix, the submatrix \(\mathbf{Q}_1\) is the first \(r\) columns of \(\mathbf{Q}\) where \(r\) is the rank of matrix \(\mathbf{H}\), and \(\mathbf{R}_2\) is a matrix of zeros. We reparametrize using \(\mathbf{c}=\mathbf{Q}_2\tilde{\mathbf{c}}\) for some \(\tilde{\mathbf{c}}\) and then it is guaranteed that \(\mathbf{H}\mathbf{c}=0\). The problem (20) is now changed to

$$\begin{aligned} \mathop {\text {min}}_{\tilde{\mathbf{c}}} \{\Vert \mathbf{z}-\mathbf{B}\mathbf{Q}_2\tilde{\mathbf{c}}\Vert ^2+\lambda (\mathbf{Q}_2\tilde{\mathbf{c}})^T\mathbf{P}(\mathbf{Q}_2\tilde{\mathbf{c}})\}. \end{aligned}$$
(22)

For a fixed penalty parameter \(\lambda \), if \(\hat{\tilde{\mathbf{c}}}\) solves (22), the solution of the original problem (20) is

$$\begin{aligned} \hat{\mathbf{c}}= \mathbf{Q}_2\hat{\tilde{\mathbf{c}}}=\mathbf{Q}_2\mathbf{Q}_2^T(\mathbf{B}^T\mathbf{B}+\lambda \mathbf{P})^{-1}\mathbf{Q}_2\mathbf{Q}_2^T\mathbf{B}^T\mathbf{z}. \end{aligned}$$
(23)

The bivariate smooth function is estimated as \(\hat{f}(x,y) = \varvec{B}_d^T(x,y)\hat{\mathbf{c}}\).

3.2 Selecting the penalty parameter

The penalty parameter \(\lambda \) controls the trade-off between model fitting and model parsimony. A large value of \(\lambda \) enforces a smoother fitted function with potentially larger fitting errors, while a small value yields a rougher fitted function and potentially smaller fitting errors. Since the in-sample fitting errors can not gauge the prediction property of the fitted function, one should target a criterion function that mimics the out-of-sample performance of the fitted model. The generalized cross-validation (GCV) (Craven and Wahba 1979; Wahba 1990) is such a criterion and is widely used for choosing the penalty parameter. The closed-form expression of the GCV criterion is

$$\begin{aligned} \text {GCV}(\lambda )=\frac{n\,\Vert \mathbf{z}-\mathbf{A}(\lambda )\mathbf{z}\Vert ^2}{\{\mathbf{tr}(\mathbf{I}-\mathbf{A}(\lambda ))\}^2}, \end{aligned}$$
(24)

where \(\mathbf{A}(\lambda )\) is the hat matrix depending on \(\lambda \) with the form

$$\begin{aligned} \mathbf{A}(\lambda )=\mathbf{B}\mathbf{Q}_2\mathbf{Q}_2^T(\mathbf{B}^T\mathbf{B}+\lambda \mathbf{P})^{-1}\mathbf{Q}_2\mathbf{Q}_2^T\mathbf{B}^T. \end{aligned}$$
(25)

We need to employ a numerical optimization algorithm to find the \(\lambda \) value that minimizes the GCV criterion. Following Wood (2004), we search for the optimal \(\lambda \) using Newton’s method on \(\eta =\log \lambda \). In detail, we replace \(\lambda \) with \(\exp \eta \) in (24). Then according to Newton’s method, the \(k\)th update of \(\eta \) is

$$\begin{aligned} \eta ^{(k+1)}=\eta ^{(k)}-M_k^{-1}m_k, \end{aligned}$$
(26)

where \(m\) and \(M\) are the values of the first and second derivatives of the GCV criterion with respect to \(\eta \) evaluated at \(\eta ^{(k)}\). Once the algorithm has converged, we find the optimal \(\eta _{opt}\) and therefore \(\lambda _{opt}=\exp \eta _{opt}\). The transformation from \(\lambda \) to \(\eta \) guarantees that the chosen \(\lambda \) at the convergence step is positive.

4 Implementation details

This section discusses some implementation details such as construction and representation of the triangulation, construction of the constraint matrix, and some computational issues.

4.1 Construction of triangulation

Our method requires the construction of a triangulation. We refer to Chapter 4 of Lai and Schumaker (2007) for a detailed discussion of triangulations. Clearly no single algorithm exists that is appropriate for all purposes. We manually generated triangulations for the numerical examples in this paper. The general guideline we followed is that we should avoid having triangles with a very small interior angle and that there is no triangle that contains no data point. When applying the penalized spline method to univariate smoothing, the number of knots is not crucial in many applications as long as it is moderately large. Similarly, for the bivariate cases we consider here, we expect the results are not very sensitive to the triangulation when sufficient triangles are used to capture features of interest, since the roughness penalty helps regularize the estimation. A triangulation with 20–40 triangles worked well in our simulation study.

4.2 Computer representation of triangulation

For computer implementation, we need to represent the structure of a triangulation in a concise way. Although there are other ways of doing it, we find the following approach convenient. We first label all vertices in the triangulation in any order as long as there are no duplicate labels and then construct two lists. One list is the vertex list with the \(i\)th row storing the Cartesian coordinates \((x_i, y_i) \) for the \(i\)th vertex. The second list is the triangle list with \(i\)th row, a triple \((l_{i1},l_{i2},l_{i3})\), recording that the \(i\)th triangle is comprised of the \(l_{i1}\)th, \(l_{i2}\)th and \(l_{i3}\)th points in the vertex list.

As an example, consider the triangulation of the region consisting of two adjacent triangles as shown in Fig. 4. Tables 1 and 2 give the two-list representation of the triangulation. According to the triangle list shown in Table 2, the first triangle \(A_1\) consists of vertices \(v_1,\, v_2,\, v_3\) and the second triangle \(A_2\) consists of \(v_1,\, v_3,\, v_4\). The locations of vertices \(v_1, v_2, v_3\), and \(v_4\) are recorded in the vertex list shown in Table 1. This representation is not unique; for example, the first row of the triangle list can be any permutation of \(1,2,3\). However, this non-uniqueness does not create any problem since using different representations does not change the structure of the triangulation.

Fig. 4
figure 4

Second example of two triangles sharing an edge

Table 1 Vertex list for the triangulation in Fig. 4
Table 2 Triangle list for the triangulation in Fig. 4

4.3 Construction of the constraint matrix

Equation (16) provides the basis for constructing the constraint matrix \(\mathbf{H}\). We explain the detailed steps of the construction here through an example. Consider the triangulation in Fig. 4 and construction of bivariate splines on this triangulation using degree-2 (\(d=2\)) piecewise polynomials. There are \((d+1)(d+2)/2=6\) Bernstein basis polynomials defined on each triangle with corresponding coefficients denoted as \(\{c_{ijk}^{(1)}\}\) and \(\{c_{ijk}^{(2)}\}\). Denote the vector \(\mathbf{c}=(c^{(1)}_{2,0,0}, c^{(1)}_{1,1,0}, c^{(1)}_{1,0,1}, c^{(1)}_{0,2,0}, c^{(1)}_{0,1,1}, c^{(1)}_{0,0,2}, c^{(2)}_{2,0,0},\ldots ,c^{(2)}_{0,0,2} )^T\) as the union of \(\{c_{ijk}^{(1)}\}\) and \(\{c_{ijk}^{(2)}\}\) using the ordering as in (8).

Suppose we want to introduce a smoothness constraint such that the bivariate spline is continuous over the whole region. Applying (16) with \(l=r=0\), we obtain

$$\begin{aligned} c_{0jk}^{(1)}= c_{0kj}^{(2)}B_{2;000}^{(2)}(v_4)=c_{0kj}^{(2)}, \end{aligned}$$
(27)

for any non-negative integers \(k,j\) such that \(k+j=2\). The constraint matrix is

$$\begin{aligned} \mathbf{H}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{}} 0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}-1\\ 0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}-1&{}0\\ 0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}-1&{}0&{}0\\ \end{array}\right] . \end{aligned}$$

Suppose we want to introduce a smoothness constraint such that the bivariate spline has continuous first derivatives over the whole region. In addition to the equations in (27), we need also apply (16) with \(l=1\). The resulting equations are

$$\begin{aligned} c_{1jk}^{(1)}&= c_{1kj}^{(2)}B^{(2)}_{2;100}(v_4)+c^{(2)}_{0,k+1,j}B^{(2)}_{2;010}(v_4)+c^{(2)}_{0,k,j+1}B^{(2)}_{2;001}(v_4) \nonumber \\&= c_{1kj}^{(2)}-c^{(2)}_{0,k+1,j} + c^{(2)}_{0,k,j+1}, \end{aligned}$$
(28)

for any non-negative integers \(k,j\) such that \(k+j=1\). Here we used the fact that the barycentric coordinates of \(v_4\) with respect to the triangle \(A_1\) is \((1,-1,1)\). The equations in (27) and (28) together yield the following constraint matrix

$$\begin{aligned} \mathbf{H}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{}} 0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}0&{}-1\\ 0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}0&{}0&{}-1&{}0\\ 0&{}0&{}0&{}0&{}0&{}1&{}0&{}0&{}0&{}-1&{}0&{}0\\ 0&{}1&{}0&{}0&{}0&{}0&{}0&{}0&{}-1&{}0&{}1&{}-1\\ 0&{}0&{}1&{}0&{}0&{}0&{}0&{}-1&{}0&{}1&{}-1&{}0\\ \end{array}\right] . \end{aligned}$$
(29)

Here, the first three rows of \(\mathbf{H}\) corresponds to (27) and the last two rows of \(\mathbf{H}\) correspond to (28).

4.4 Some computation issues

Both the calculation of the coefficient estimation using (23) and the evaluation of the GCV criterion (24) involve the inversion of the matrix \((\mathbf{B}^T\mathbf{B}+\lambda \mathbf{P})\) where \(\mathbf{B}\) is the evaluation matrix of Bernstein basis polynomials and \(\mathbf{P}\) is the penalty matrix. When we have \(K\) triangles in the triangulation, \((\mathbf{B}^T\mathbf{B}+\lambda \mathbf{P})\) is a \(\{K(d+1)(d+2)/2\} \times \{K(d+1)(d+2)/2\}\) matrix. This matrix could be very large and the computation of its inverse is expensive especially when the domain is irregular with a curvy boundary and a large number of small triangles is needed to have a good coverage of the domain. However, thanks to the locality of Bernstein basis polynomials, both the matrices \(\mathbf{B}^T\mathbf{B}\) and \(\mathbf{P}\) are block diagonal with each block of dimension \(\{(d+1)(d+2)/2\}\times \{(d+1)(d+2)/2\}\). Thus, we can calculate the inversion of each small block to derive the inversion of the whole matrix. Since \(\mathbf{B},\, \mathbf{P}\) and \(\mathbf{H}\) are sparse matrices, we can also save memory spaces by only storing the non-zero elements.

5 Numerical results

This section illustrates the proposed penalized bivariate spline method using a simulation study and a real dataset. We also report some results from the simulation study of comparing the proposed method with several existing bivariate smoothers.

5.1 Simulation of model fitting on a complicated domain

Our simulation study used the horseshoe-shaped domain (referred to as the target domain hereafter) as shown in Fig. 5. Such a domain was used previously in the literature to illustrate smoothing over difficult domains (Ramsay 2002; Wood et al. 2008).

Fig. 5
figure 5

Comparison of four surface smoothers for estimating a function over a horseshoe-shaped domain for a simulated dataset with noise level \(\sigma =1\). Top left panel shows an image plot with contour lines of the true surface. The middle and bottom panels show image plots of the estimation errors of using the penalized bivariate splines (PBS), soap film smoother (SOAP), thin-plate regression splines (TPRS), and tensor-product splines (TPS). Top right panel shows the triangulation used by the PBS

The surface function to be recovered from noisy data is

$$\begin{aligned} f(x,y) = 8\,\sin (xy) + 0.5\,x^2\, I(x>0, y>0) -0.5\, x^2\, I(x>0, y<0), \end{aligned}$$
(30)

whose contour plot is shown on the top left panel of Fig. 5. This function is smooth over the manifold of the horseshoe-shaped domain but is not smooth over \(R^2\). We uniformly sampled \(n=500\) locations in the rectangle \([-1, 4] \times [-1, 1]\) and evaluated the function at only those locations falling in the target domain. We then added i.i.d. white noises from \(N(0,\sigma ^2)\) distribution to the function values to obtain the simulated data. Three noise levels, \(\sigma =.1,\, 1\), and \(2\), were considered. The goal is to estimate the surface function (30) based on the noisy data.

We applied the proposed penalized bivariate splines method to the data using the triangulation depicted on the top right panel of Fig. 5. We used \(d=3\) and \(r=1\) when generating bivariate splines and used the thin-plate spline penalty to regularize the function estimation. We also applied three other surface smoothers from the literature: the soap film smoother (Wood et al. 2008), thin-plate regression splines (Wood 2003), and the tensor product splines. We employed the implementation of these three smoothers in the mgcv R package (Wood 2006). We used 60 basis functions for the soap film smoother and the thin-plate regression splines, 64 basis functions (8 marginal basis functions for each variable) for the tensor product splines. All four methods use roughness penalties to regularize the fit—the soap film smoother uses the squared Laplacian penalty while other three smoothers use the thin-plate spline penalty. The GCV criterion is used by all four methods for selecting the penalty parameters. Figure 5 shows the estimation errors of the four smoothing methods for a simulated dataset when the noise level is \(\sigma =1\). We can see that the penalized bivariate spline method gives overall smaller estimation errors than other three methods. Similar behaviors of the smoothers were observed for other datasets and for other noise levels.

To get a systematic assessment of the performance of the bivariate smoothers, we ran the simulation 400 times for each setup of noise level. For each simulation run, we calculated the integrated squared error (ISE) over the target domain of the function estimate by each method. The summaries of the results with boxplots are given in Fig. 6. It is clear that the proposed penalized bivariate spline method outperforms the other three smoothers. The reason that both the thin-plate splines and the tensor-product splines gave higher ISEs is that both methods do not take into account the manifold structure of the domain. The soap film smoother is designed to deal with difficult regions. Its inferior performance in this example may be attributed to the fact that it needs extra effort to estimate the boundary functions.

Fig. 6
figure 6

Comparison of four surface smoothers for estimating a function over a horseshoe-shaped domain at three noise levels, \(\sigma =.1,\, 1\), and \(2\). The four smoothers are the penalized bivariate splines (PBS), soap film smoother (SOAP), thin-plate regression splines (TPRS), and tensor-product splines (TPS). Boxplots of IMSEs are shown based on 400 simulation runs at each noise level

5.2 Texas temperature data

In this section, we apply the penalized bivariate spline method in a real data example of constructing the temperature surface over the state of Texas based on data collected by the International Research Institute for Climate and Society. The dataset available to us consists of monthly average temperatures (unit: \({}^{\circ }\hbox {C}\)) at 52 weather stations in the state of Texas from Jan. 1867 to Dec. 1995. Figure 7 shows the locations of the stations and the temperature records from two selected stations and two selected months. In some time periods, most stations have no record. For example, only 15 stations have records of the temperature in 1989 and 11 stations in 1990. But almost all stations have records between 1930 and 1987.

Fig. 7
figure 7

Texas monthly temperature data. Top left panel locations of all weather stations on Texas map are marked as triangles; triangulation used for the penalized bivariate spline method is shown as grey lines. Top right panel time series plot of available data for two stations; the discontinuous parts indicate missing data at those times. Low panels the stations having observations on August and December 1987 are shown as triangles, the size of the triangle indicates the magnitude of temperature; stations without observations are marked with the plus sign

We applied the penalized bivariate spline method to data from four selected months, August of 1986 and 1987, and December of 1986 and 1987, to reconstruct the temperature surface for each month using data from all stations in that month. We used cubic bivariate splines (\(d = 3\)) on the triangulation shown on the top left panel of Fig. 7 and imposed continuous first directional derivatives across shared edges (\(r = 1\)). The thin-plate penalty was used to regularize the estimation and the GCV was used to select the penalty parameters. The resulting estimated temperature surfaces depicted in Fig. 8 show the following temperature patterns: The spatial temperature variation is smaller during August than December; there is little variation in the east half of Texas during August. Moreover, December temperature shows a clear latitude effect—temperature gets higher as the latitudes gets lower.

Fig. 8
figure 8

Image plots with contour lines of the estimated temperature surfaces for four selected months, using penalized bivariate splines