Abstract
The multiscale local polynomial transform, developped in this paper, combines the benefits from local polynomial smoothing with sparse multiscale decompositions. The contribution of the paper is twofold. First, it focusses on the bandwidths used throughout the transform. These bandwidths operate as user controlled scales in a multiscale analysis, which is explained to be of particular interest in the case of nonequispaced data. The paper presents both a likelihood based optimal bandwidth selection and a fast, heuristic approach. The second contribution of the paper is the combination of local polynomial smoothing with orthogonal prefilters, similar to Daubechies’ wavelet filters, but defined on irregularly spaced covariate values.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Anderson, L., et al.: The clustering of galaxies in the sdss-iii baryon oscillation spectroscopic survey: baryon acoustic oscillations in the data release 9 spectroscopic galaxy sample. Mon. Notic. R. Astron. Soc. 427(4), 3435–3467 (2012)
Anderson, L., et al.: The clustering of galaxies in the sdss-iii baryon oscillation spectroscopic survey: Baryon acoustic oscillations in the data release 10 and 11 galaxy samples. Mon. Notic. R. Astron. Soc. 441(1), 24–62 (2014)
Burt, P.J., Adelson, E.H.: Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Daubechies, I.: Ten Lectures on wavelets. CBMS-NSF regional conference series in applied mathematics, vol. 61. SIAM, Philadelphia (1992)
Do, M.N., Vetterli, M.: Framing pyramids. IEEE Trans. Signal Process. 51(9), 2329–2342 (2003)
Fan, J., Gijbels, I.: Local Polynomial Modelling and its Applications. Chapman and Hall, London (1996)
Jansen, M.: Multiscale local polynomial smoothing in a lifted pyramid for non-equispaced data. IEEE Trans. Signal Process. 61(3), 545–555 (2013)
Jansen, M.: Multiscale local polynomial models for estimation and testing. In: Akritas, M., Lahiri, S.N., Politis, D., (eds.), Topics in NonParametric statistics of the Springer proceedings in mathematics & statistics, chapter 14, vol. 74 , pp. 155–166. Springer, New York, Proceedings of the first conference of the international society for nonparametric statistics (2014)
Percival, W.J., et al.: Baryon acoustic oscillations in the sloan digital sky survey data release 7 galaxy sample. Mon. Notic. R. Astron. Soc. 401(4), 2148–2168 (2010)
Strang, G., Nguyen, T.: Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley (1996)
Sweldens, W.: The lifting scheme: A custom-design construction of biorthogonal wavelets. Appl. Comp. Harmon. Anal. 3(2), 186–200 (1996)
Sweldens, W.: The lifting scheme: a construction of second generation wavelets. SIAM J. Math. Anal. 29(2), 511–546 (1998)
Vieu, P.: Nonparametric regression: optimal local bandwidth choice. J. R. Stat. Soc. Ser. B 53(2), 453–464 (1991)
Acknowledgments
Research support by the IAP research network Grant Nr. P7/06 of the Belgian government (Belgian Science Policy) is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Construction of prefilters using elementary orthogonal matrices
This sections develops more details on the construction of prefilters using the form in (25). We assume that appropriate choices of \(\gamma _j\) and \(\widetilde{{\mathbf {J}}}_j\) allow us to define a real matrix \(\widetilde{{\mathbf {S}}}_j\) in (24). In particular, as in Lemma 3, the subsampling operation \(\widetilde{{\mathbf {J}}}_j\) leaves out not only the odd indices from \({\varvec{s}}_{j+1}\), but also \({\widetilde{p}}-(2n_j-n_{j+1})\) even indices near the boundary. Near the boundary, the filtering is replaced by simple subsampling, meaning for instance that \(s_{j,0} = _{j+1,0}\), thus allowing us to focus on the interior. Obviously, alternative solutions may filter near the boundary, at the price of slightly less variance reduction in the interior.
The initial step of the iteration creates the matrix \( \widetilde{{\mathbf {U}}}_j^{[0]} = \widetilde{{\mathbf {S}}}_j \widetilde{{\mathbf {Q}}}_j^{[0]} - \widetilde{{\mathbf {J}}}_j\widetilde{{\mathbf {V}}}_j^T. \) As the primary goal of this step is to map the \(n_j \times n_j\) matrix \(\widetilde{{\mathbf {S}}}_j\) onto the \(n_j \times (n_{j+1}-{\widetilde{p}})\) matrix \( \widetilde{{\mathbf {S}}}_j^{[0]} = \widetilde{{\mathbf {S}}}_j \widetilde{{\mathbf {Q}}}_j^{[0]}, \) the matrix \(\widetilde{{\mathbf {Q}}}_j^{[0]}\) is chosen to be a possibly row-permuted submatrix of the \((n_{j+1}-{\widetilde{p}}) \times (n_{j+1}-{\widetilde{p}})\) identity matrix. The resulting matrix \(\widetilde{{\mathbf {S}}}_j^{[0]}\) contains the columns of \(\widetilde{{\mathbf {S}}}_j\) completed with zero columns. Since \(\Vert {\mathbf {A}}-{\mathbf {B}} \Vert _F^2 = \Vert {\mathbf {A}} \Vert _F^2+\Vert {\mathbf {B}} \Vert _F^2-2\mathrm {Tr}(A^TB)\), the norm of \({\widetilde{U}}_j^{[0]}\) can be minimised by looking at the maximum values in each row of \( \widetilde{{\mathbf {S}}}_j^T \left( \widetilde{{\mathbf {J}}}_j\widetilde{{\mathbf {V}}}_j^T\right) . \)
The next matrices \(\widetilde{{\mathbf {Q}}}_j^{[i]}\) consist of products of elementary Givens rotations or reflections. More precisely,
where the elements of \(\widetilde{{\mathbf {Q}}}_j^{[i,\ell ]}\) coincides with those of \({\mathbf {I}}_j\), except in the \(2 \times 2\) submatrix at rows \((r_1,r_2)\) and columns \((r_1,r_2)\), where the couple \((r_1,r_2)\) depends on \(\ell \). This submatrix is equal to
where the binary \(b_\ell \) and the real \(\alpha _\ell \in [0,2\pi ]\) are chosen to minimize the Frobenius norm of the outcome \(\widetilde{{\mathbf {U}}}_j^{[i,\ell ]} = \widetilde{{\mathbf {S}}}_j^{[i,\ell -1]}\widetilde{{\mathbf {Q}}}_j^{[i,\ell ]} -\widetilde{{\mathbf {J}}}_j\widetilde{{\mathbf {V}}}_j^T\). Since right multiplication with \(\widetilde{{\mathbf {Q}}}_j^{[i,\ell ]}\) affects rows \(r_1\) and \(r_2\) only, the values of \(b_\ell \) and \(\alpha _\ell \) are easy to find. A slight modification consists in weighting the elements of the matrix in the calculation of the Frobenius norm. More precisely, by putting zero weights on the elements near the diagonal of \(\widetilde{{\mathbf {U}}}_j^{[i,\ell ]}\), this matrix and also \(\widetilde{{\mathbf {F}}}_j^{[i,\ell ]}\) are pushed towards diagonal dominance.
Appendix 2: Proof of Lemma 2
First, the polynomial reproducing condition (17) represents \(n_j{\widetilde{p}}\) linear equations, independently from the number of zeros elements in \(\widetilde{{\mathbf {F}}}_j\). Second, the diagonal of (16) adds \(n_j\) non-linear non-homogeneous equations. The homogeneous equations corresponding to the off-diagonals express orthogonality between rows of \(\widetilde{{\mathbf {F}}}_j\). These equations are inactive if the nonzeros of the rows under consideration have no overlap. Ignoring boundary effects, this amounts to \(\left\lceil (r-l)/2\right\rceil -1\) active equations on each row of (16). Third, from the proof of Lemma 1, it turns out that (16) being the optimal variance reduction implies that all columns of \(\widetilde{{\mathbf {F}}}_j\) add up to the same value, more precisely \( \widetilde{{\mathbf {F}}}_j^T{\varvec{1}}_j = {\varvec{1}}_{j+1} (n_j/n_{j+1}). \) These \(n_{j+1}\) equations for column sums are linearly independent from the row sum equations in (17), except for the last one. In total, we have at least \({\mathcal {O}}(n_j{\widetilde{p}}+n_j+n_j(\left\lceil (r-l)/2\right\rceil -1)+n_{j+1})\) conditions, which is at least \({\widetilde{p}}+2+\left\lceil (r-l)/2\right\rceil \) at each row. Therefore, the number of free parameters at each row, \(r-l\), must satisfy \((r-l) \ge {\widetilde{p}}+2+\left\lceil (r-l)/2\right\rceil \), which leads to the stated result. \(\square \)
In the case of equidistant covariate values, the prefilter \(\widetilde{{\mathbf {F}}}_j\) has only \(2{\widetilde{p}}\) nonzeros on each row. This can be understood from the fact that for equidistant covariates, \(\widetilde{{\mathbf {F}}}_j = \widetilde{{\mathbf {J}}}_j\overline{{\mathbf {F}}}_j\), where is \(\overline{{\mathbf {F}}}_j\) a Toeplitz matrix. The proof of Lemma 1 reveals that in the general case \(n_{j+1}-1\) linear equations follow from minimising the output variance, namely that all column sums \(\widetilde{{\mathbf {F}}}_j^T{\varvec{1}}_j\) have the same value. For \(\widetilde{{\mathbf {F}}}_j = \widetilde{{\mathbf {J}}}_j\overline{{\mathbf {F}}}_j\), with \(\overline{{\mathbf {F}}}_j\) Toeplitz, all even column sums are automatically equal to each other, and the same holds for the odd column sums, thereby reducing \(n_{j+1}-1\) linear equations to a single equation. This subtracts the value two from the number of nonzeros on each row. But because of that, there is less overlap between rows, giving us a bonus reduction of two elements.
Appendix 3: Proof of Lemma 3
In this proof, \(\widetilde{{\mathbf {J}}}_j\) denotes a partitioning operation, which can be more general than the even-odd subsampling.
Let \(\widetilde{{\mathbf {X}}}_{j+1}^{(2)}\) be the orthonormalised columns of \({\mathbf {X}}_{j+1}^{(2)}\), i.e., \((\widetilde{{\mathbf {X}}}_{j+1}^{(2)})_{i,1} = n_{j+1}^{-1/2}\) and \((\widetilde{{\mathbf {X}}}_{j+1}^{(2)})_{i,2} = n_{j+1}^{-1/2} (x_{j+1,i} - \overline{{\varvec{x}}}_{j+1})/ ({\overline{{\varvec{x}}^2}_{\!\!\!j+1}-\overline{{\varvec{x}}}_{j+1}^2})^{1/2}\). Then for any vector \({\varvec{c}}\) of length \(n_j\), we have
As \([\widetilde{{\mathbf {X}}}_{j+1}^{(2)} \widetilde{{\mathbf {V}}}_j^T]\) constitutes an orthogonal matrix, we have \( \Vert \widetilde{{\mathbf {X}}}_{j+1}^{(2)T}\widetilde{{\mathbf {J}}}_j^T{\varvec{c}} \Vert _2^2 + \Vert \widetilde{{\mathbf {V}}}_j\widetilde{{\mathbf {J}}}_j^T{\varvec{c}} \Vert _2^2 = \Vert \widetilde{{\mathbf {J}}}_j^T{\varvec{c}} \Vert _2^2 = \Vert {\varvec{c}} \Vert _2^2. \) From this it follows that
This holds for any vector \({\varvec{c}}\) if \( \Vert \widetilde{{\mathbf {X}}}_{j+1}^{(2)T}\widetilde{{\mathbf {J}}}_j^T \Vert _2^2 = \Vert \widetilde{{\mathbf {J}}}_j\widetilde{{\mathbf {X}}}_{j+1}^{(2)} \Vert _2^2 \le \gamma _j. \) By writing \(\rho ({\mathbf {A}})\) for the spectral radius of matrix \({\mathbf {A}}\), the squared matrix norm is
with \(\xi _j\) and \(\zeta _j\) as defined in (26) and (27). The squared matrix norm is equal to the minimum value of \(\gamma _j\) proposed in (28). \(\square \)
Appendix 4: Kullback–Leibler divergence of the approximative mixture model in Sect. 4.3
The error term in (37) consists of two terms \( R(p_j,a_j,\tau _j,\sigma _{j,k}) = R_0(p_j,a_j,\tau _j,\sigma _{j,k},\lambda _j) + R_1(p_j,a_j,\tau _j,\sigma _{j,k},\lambda _j). \) These terms are defined as
Since
we can rewrite
where the following definition of \(R_{00}\) uses the short notation \(\epsilon = \sqrt{\sigma ^2+\tau ^2}\),

In this expression, we can bound the second factor of the second term by
provided that \(P(|{\widetilde{D}}|\le d|M=1) \le P(|{\widetilde{D}}|\le d|M=0)\) for any positive d, which is true for \(a\epsilon \) sufficiently close to zero. In the first factor of the second term, we use that \(g(u;a,\sigma )/\phi _\epsilon (u)\) is symmetric on \([-\lambda \epsilon ,\lambda \epsilon ]\) and non-decreasing on \([0,\lambda \epsilon ]\),

This factor is bounded below by \(-p/(1-p)\) and above by \({\mathcal {O}}(\lambda ^2)\), while the values of a, \(\sigma \), and \(\epsilon \) have no impact on these bounds.
In the first term we have, for p small,

This is further rewritten and bounded as

All together, we have, for \(\lambda \rightarrow \infty \) and \(p \rightarrow 0\),
For \(R_1(p_j,a_j,\tau _j,\sigma _{j,k},\lambda _j)\), we use that \( {\widetilde{p}}_\lambda = P({\widetilde{M}}=1|{\varvec{X}}) = p \, 2\left[ 1-G(\lambda \epsilon ;a,\sigma )\right] + (1-p) \, 2\left[ 1-{\varPhi }(\lambda )\right] . \) For \(\lambda \epsilon \rightarrow \infty \) and for \(a \rightarrow 0\), so that \(a\lambda \epsilon \rightarrow 0\), it can be verified that \(G(\lambda \epsilon ;a,\sigma ) \rightarrow 0\), and thus \({\widetilde{p}}_\lambda /p \rightarrow 1\). The condition that \(a\lambda \epsilon \rightarrow 0\) means that \(\lambda \rightarrow \infty \), but not too fast, as otherwise, it would take away too many large coefficients. Furthermore, as

this term is bounded from below and from above by
where \(r_1(d) = g(d;a,\sigma )/{\widetilde{f}}_1(d;a,\lambda \epsilon )\) and \(r_0(d) = \phi (d/\epsilon )/\epsilon {\widetilde{f}}_1(d;a,\lambda \epsilon )\). The function \(r_1(d)\) reaches a global minimum at \(d=0\), from where it increases monotonically from \(r_1(0) = 2[1-{\varPhi }(a\sigma )]\exp (\sigma ^2a^2/2)\) towards \(r_1(\pm \infty ) = \exp (\sigma ^2a^2/2)\). On \(]-\infty ,-\lambda \epsilon ]\cup [\lambda \epsilon ,\infty [\), and for \(\lambda > a\epsilon \), the function \(r_0(d) = \phi (d/\epsilon )/\epsilon {\widetilde{f}}_1(d;a,\lambda \epsilon )\) decreases monotonically and rapidly from \(r_0(\lambda \epsilon ) = \exp (-\lambda ^2/2)/\sqrt{\pi /2}a\epsilon \) to 0. The bounds for \(R_1(p,a,\tau ,\sigma ,\lambda )\) are then
from which we conclude that \(R_1(p,a,\tau ,\sigma ,\lambda ) \sim {\widetilde{p}}_\lambda \sim p\), when \(a \rightarrow 0\) and \(\lambda \rightarrow \infty \), so that \(\exp (-\lambda ^2/)/a \rightarrow 0\). This means that \(\lambda \) should grow sufficiently fast in order to prevent false positives from dominating the error term \(R_1\). \(\square \)
Rights and permissions
About this article
Cite this article
Jansen, M., Amghar, M. Multiscale local polynomial decompositions using bandwidths as scales. Stat Comput 27, 1383–1399 (2017). https://doi.org/10.1007/s11222-016-9692-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9692-8