Interpolation of sparse high-dimensional data

Lux, Thomas C. H.; Watson, Layne T.; Chang, Tyler H.; Hong, Yili; Cameron, Kirk

doi:10.1007/s11075-020-01040-2

Interpolation of sparse high-dimensional data

Original Paper
Published: 13 November 2020

Volume 88, pages 281–313, (2021)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

Thomas C. H. Lux ORCID: orcid.org/0000-0002-1858-4724¹,
Layne T. Watson¹,
Tyler H. Chang¹,
Yili Hong¹ &
…
Kirk Cameron¹

826 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Increases in the quantity of available data have allowed all fields of science to generate more accurate models of multivariate phenomena. Regression and interpolation become challenging when the dimension of data is large, especially while maintaining tractable computational complexity. Regression is a popular approach to solving approximation problems with high dimension; however, there are often advantages to interpolation. This paper presents a novel and insightful error bound for (piecewise) linear interpolation in arbitrary dimension and contrasts the performance of some interpolation techniques with popular regression techniques. Empirical results demonstrate the viability of interpolation for moderately high-dimensional approximation problems, and encourage broader application of interpolants to multivariate approximation in science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirical Interpolation Decomposition

Article 12 November 2018

Explicit multivariate approximations from cell-average data

Article Open access 12 December 2022

Smooth Interpolation of Data by Efficient Algorithms

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, GS., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G, Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/. Software available from tensorflow.org (2015)
Barthelmann, V., Novak, E., Ritter, K.: High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math. 12(4), 273–288 (2000)
Article MathSciNet Google Scholar
Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural. Inf. Process-Lett. Rev. 11(10), 203–224 (2007)
Google Scholar
Bengio, Y., Grandvalet, Y.: No unbiased estimator of the variance of k-fold cross-validation. J. Mach. Learn. Res. 5(Sep), 1089–1105 (2004)
MathSciNet MATH Google Scholar
de Boor, C.: A Practical Guide to Splines, vol. 27. Springer, New York (1978)
Book Google Scholar
de Boor, C., Höllig, K, Riemenschneider, S.: Box Splines, vol. 98. Springer Science & Business Media (2013)
Bos, L., De Marchi, S., Sommariva, A., Vianello, M.: Computing multivariate Fekete and Leja points by numerical linear algebra. SIAM J. Numer. Anal. 48(5), 1984–1999 (2010)
Article MathSciNet Google Scholar
Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numer. 13, 147–269 (2004)
Article MathSciNet Google Scholar
Cameron, K.W., Anwar, A., Cheng, Y., Xu, L., Li, B., Ananth, U., Bernard, J., Jearls, C., Lux, T.C.H., Hong, Y., Watson, L.T., Butt, A.R.: Moana: modeling and analyzing i/o variability in parallel system experimental design. IEEE Trans. Parallel Distrib. Syst.. https://doi.org/10.1109/TPDS.2019.2892129 (2019)
Chang, T.H., Watson, L.T., Lux, T.C.H., Li, B., Xu, L., Butt, A.R., Cameron, K.W., Hong, Y.: A polynomial time algorithm for multivariate interpolation in arbitrary dimension via the Delaunay triangulation. In: Proceedings of the ACMSE 2018 Conference, ACMSE ’18. ACM, New York, pp. 12:1–12:8. https://doi.org/10.1145/3190645.3190680 (2018)
Cheney, E.W., Light, W.A.: A Course in Approximation Theory, vol. 101. American Mathematical Soc (2009)
Chkifa, A., Cohen, A., Schwab, C.: High-dimensional adaptive sparse polynomial interpolation and applications to parametric pdes. Found. Comput. Math. 14(4), 601–633 (2014)
Article MathSciNet Google Scholar
Chollet, F., et al.: Keras. https://keras.io (2015)
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289 (2015)
Cortez, P., Morais, AdJR: A data mining approach to predict forest fires using meteorological data. 13th Portuguese Conference on Artificial Intelligence (2007)
Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance. Proceedings of 5th Annual Future Business Technology Conference Porto (2008)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 13(1), 21–27 (1967)
Article Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for Lvcsr using rectified linear units and dropout. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. IEEE, pp. 8609–8613 (2013)
De Vito, S., Massera, E., Piga, M., Martinotto, L., Di Francia, G.: On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens. Actuators B 129(2), 750–757 (2008)
Article Google Scholar
Dennis, J.E. Jr., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations, vol. 16. Siam (1996)
Dirichlet, G.L.: ÜBer die reduction der positiven quadratischen formen mit drei unbestimmten ganzen zahlen. J. Reine. Angewandte Math. 40, 209–227 (1850)
MathSciNet Google Scholar
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat.:1–67 (1991)
Friedman, J.H.: The Computational Statistics Laboritory of Stanford University: Fast mars (1993)
Fritsch, F.N., Carlson, R.E.: Monotone piecewise cubic interpolation. SIAM J. Numer. Anal. 17(2), 238–246 (1980)
Article MathSciNet Google Scholar
Goh, G.: Why momentum really works. Distill. https://doi.org/10.23915/distill.00006. http://distill.pub/2017/momentum (2017)
Gordon, W.J., Wixom, J.A.: Shepard’s method of “metric interpolation” to bivariate and multivariate interpolation. Math. Comput. 32(141), 253–264 (1978)
MathSciNet MATH Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural. Netw. 2(5), 359–366 (1989)
Article Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, Montreal, pp. 1137–1145 (1995)
Kövari, T, Pommerenke, C.: On the distribution of fekete points. Mathematika 15(1), 70–75 (1968)
Article MathSciNet Google Scholar
Lazos, D., Sproul, A.B., Kay, M.: Optimisation of energy management in commercial buildings with weather forecasting inputs: a review. Renew. Sustain. Energy. Rev. 39, 587–603 (2014)
Article Google Scholar
Lee, D.T., Schachter, B.J.: Two algorithms for constructing a Delaunay triangulation. Int. J. Comput. Inf. Sci. 9(3), 219–242 (1980)
Article MathSciNet Google Scholar
Lilliefors, H.W.: On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am. Stat. Assoc. 62(318), 399–402 (1967)
Article Google Scholar
Lux, T.C.H., Pittman, R., Shende, M., Shende, A.: Applications of supervised learning techniques on undergraduate admissions data. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, pp. 412–417 (2016)
Lux, T.C.H., Watson, L.T., Chang, T.H., Bernard, J., Li, B., Yu, X., Xu, L., Back, G., Butt, A.R., Cameron, K.W., et al: Nonparametric Distribution Models for Predicting and Managing Computational Performance Variability. In: Southeastcon 2018. IEEE, pp. 1–7 (2018)
Lux, T.C.H., Watson, L.T., Chang, T.H., Bernard, J., Li, B., Yu, X., Xu, L., Back, G., Butt, A.R., Cameron, K.W., et al.: Novel meshes for multivariate interpolation and approximation. In: Proceedings of the ACMSE 2018 Conference. ACM, pp. 13 (2018)
Migliorati, G., Nobile, F., von Schwerin, E., Tempone, R.: Approximation of quantities of interest in stochastic pdes by the random discrete lˆ2 projection on polynomial spaces. SIAM J. Sci. Comput. 35(3), A1440–A1460 (2013)
Article MathSciNet Google Scholar
Møller, MF: A scaled conjugate gradient algorithm for fast supervised learning. Neural. Netw. 6(4), 525–533 (1993)
Article Google Scholar
Navidi, W.C.: Statistics for Engineers and Scientists, 4 edn. McGraw-Hill Education (2015)
Nobile, F., Tamellini, L., Tempone, R.: Convergence of quasi-optimal sparse-grid approximation of hilbert-space-valued functions: application to random elliptic pdes. Numer. Math. 134(2), 343–388 (2016)
Article MathSciNet Google Scholar
Norcott, W.D.: Iozone filesystem benchmark. http://www.iozone.org. [Online; Accessed 12 Oct 2017] (2017)
Park, J.S.: Optimal latin-hypercube designs for computer experiments. J. Stat. Plann. Inference. 39(1), 95–111 (1994)
Article MathSciNet Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pozzolo, A.D., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: IEEE Symposium Series on Computational Intelligence. IEEE, pp. 159–166. https://doi.org/10.1109/SSCI.2015.33. https://www.kaggle.com/mlg-ulb/creditcardfraud. [Online; Accessed 25 Jan 2019] (2015)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat.:400–407 (1951)
Rudy, J., Cherti, M.: Py-earth: a python implementation of multivariate adaptive regression splines. https://github.com/scikit-learn-contrib/py-earth. [Online; Accessed 09 Jul 2017] (2017)
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
MATH Google Scholar
Shepard, D.: A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 1968 23rd ACM National Conference. ACM, pp 517–524 (1968)
Thacker, W.I., Zhang, J., Watson, L.T., Birch, J.B., Iyer, M.A., Berry, M.W.: Algorithm 905: Sheppack: modified Shepard algorithm for interpolation of scattered multivariate data. ACM Trans. Math. Softw. (TOMS) 37 (3), 34 (2010)
Article Google Scholar
Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Accurate telemonitoring of parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng. 57(4), 884–893 (2010)
Article Google Scholar
Unther Greiner, G., Hormann, K.: Interpolating and approximating scattered 3d-data with hierarchical tensor product b-splines. In: Proceedings of Chamonix, p. 1 (1996)
Williams, G.J.: Weather dataset rattle package. In: Rattle: A Data Mining GUI for R, vol. 1. The R Journal, pp. 45–55. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package. [Online; Accessed 25 Jan 2019] (2009)

Download references

Funding

This work was supported by the National Science Foundation Grants CNS-1565314 and CNS-1838271.

Author information

Authors and Affiliations

Virginia Polytechnic Institute and State University (VPI, SU), Blacksburg, VA, 24061, USA
Thomas C. H. Lux, Layne T. Watson, Tyler H. Chang, Yili Hong & Kirk Cameron

Authors

Thomas C. H. Lux
View author publications
You can also search for this author in PubMed Google Scholar
Layne T. Watson
View author publications
You can also search for this author in PubMed Google Scholar
Tyler H. Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yili Hong
View author publications
You can also search for this author in PubMed Google Scholar
Kirk Cameron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas C. H. Lux.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Statistical Terminology

A random variable X is precisely defined by its cumulative distribution function (CDF) F_X and the derivative of the CDF, the probability density function (PDF) f_X. For any possible value x of X, the percentile of x is 100 F_X(x), the percentage of values drawn from X that would be less than or equal to x as the number of samples tends towards infinity. The quartiles of X are its 25th, 50th (median), and 75th percentiles. The absolute difference between the median and an adjacent quartile is an interquartile range. Given an independent and identically distributed sample from X and presuming that X has finite mean and variance, a confidence interval can be drawn about any percentile estimated from the sample. A confidence interval describes the probability that a value lies within an interval. The null hypothesis is a statement (derived from some test statistic) that the expected value of the observed statistic is equal to an assumed population statistic. The p value of a given statistic value ρ for a given data set (sample from a distribution) is the probability of observing a statistic at least as extreme as ρ for other data sets (samples from that same distribution), assuming the null hypothesis holds. The smaller the p value, the stronger the statistical evidence is for rejecting the null hypothesis. For a more detailed introduction to statistics and related terminology, see the work of Navidi [38].

Raw Numerical Results

The tables that follow show the precise experimental results for all data sets presented in Section 6. The tests were all run serially on an otherwise idle machine with a CentOS 6.10 operating system and an Intel i7-3770 CPU operating at 3.4 GHz. The detailed performance results in the tables that follow are very much dependent on the problem and the algorithm implementation (e.g., some codes are TOMS software, some industry distributions, and others are from conference paper venues). Different typeface is used to show best performers, however not much significance should be attached to ranking algorithms based on small time (millisecond) differences. The results serve as a demonstration of conceptual validity.

Table 3 This numerical data accompanies the visual provided in Fig. 10

Full size table

Table 4 The left above shows how often each algorithm had the lowest absolute error approximating forest fire data in Table 3

Full size table

Table 5 This numerical data accompanies the visual provided in Fig. 12

Full size table

Table 6 The left above shows how often each algorithm had the lowest absolute error approximating Parkinson’s data in Table 5

Full size table

Table 7 This numerical data accompanies the visual provided in Fig. 14

Full size table

Table 8 Left table shows how often each algorithm had the lowest absolute error approximating Sydney rainfall data in Table 7

Full size table

Table 9 This numerical data accompanies the visual provided in Fig. 16

Full size table

Table 10 The left above shows how often each algorithm had the lowest absolute error approximating credit card transaction data in Table 9

Full size table

Table 11 This numerical data accompanies the visual provided in Fig. 18

Full size table

Table 12 The left above shows how often each algorithm had the lowest KS statistic on the I/O throughput distribution data in Table 11

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lux, T.C.H., Watson, L.T., Chang, T.H. et al. Interpolation of sparse high-dimensional data. Numer Algor 88, 281–313 (2021). https://doi.org/10.1007/s11075-020-01040-2

Download citation

Received: 30 March 2019
Accepted: 28 October 2020
Published: 13 November 2020
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11075-020-01040-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpolation of sparse high-dimensional data

Abstract

Access this article