Abstract
We present an approach to computing high-breakdown regression estimators in parallel on graphics processing units (GPU). We show that sorting the residuals is not necessary, and it can be substituted by calculating the median. We present and compare various methods to calculate the median and order statistics on GPUs. We introduce an alternative method based on the optimization of a convex function, and show its numerical superiority when calculating the order statistics of very large arrays on GPUs.
Similar content being viewed by others
References
Rousseeuw P, Leroy A (2003) Robust regression and outlier detection. Wiley, New York
Maronna R, Martin R, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York
Hampel FR (1971) A general qualitative definition of robustness. Ann Math Stat 42: 1887–1896
NVIDIA (2010) Tesla datasheet. http://www.nvidia.com/docs/io/43395/nv_ds_tesla_psc_us_nov08_lowres.pdf. Accessed 1 December
Hoberock J, Bell N (2010) Thrust: a parallel template library. version 1.3.0. http://code.google.com/p/thrust/
Rousseeuw P (1984) Least median of squares regression. J Am Stat Assoc 79: 871–880
Rousseeuw P, Van Driessen K (2006) Computing lts regression for large data sets. Data Min Knowl Discov 12: 29–45
Rousseeuw P, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88: 1273–1283
Stromberg A, Hossjer O, Hawkins DM (2000) The least trimmed differences regression estimator and alternatives. J Am Stat Assoc 95: 853–864
Hawkins DM, Olive DJ (1999) Applications and algorithms for least trimmed sum of absolute deviations regression. Comput Stat Data Anal 32: 119–134
Hofmann M, Gatu C, Kontoghiorghes E (2010) An exact least trimmed squares algorithm for a range of coverage values. J Comput Graph Stat 19(1): 191–204
Nunkesser R, Morell O (2012) An evolutionary algorithm for robust regression. Comput Stat Data Anal (in press). doi:10.1016/j.csda.2010.04.017
Nguyen TD, Welsch R (2012) Outlier detection and least trimmed squares approximation using semi-definite programming. Comput Stat Data Anal (in press). doi:10.1016/j.csda.2009.09.037
Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105(489): 147–156
Schyns M, Haesbroeck G, Critchley F (2010) RelaxMCD: smooth optimisation for the minimum covariance determinant estimator. Comput Stat Data Anal 54(4):843–857, 1698643
Beliakov G, Kelarev A (2011) Global non-smooth optimization in robust multivariate regression. Optim Methods Softw. doi:10.1080/10556788.2011.614609
Yager R, Beliakov G (2010) OWA operators in regression problems. IEEE Trans Fuzzy Syst 18: 106–113
Moré J, Wild S (2009) Benchmarking derivative-free optimization algorithms. SIAM J Optim 20: 172–191
Sedgewick R (1988) Algorithms, 2nd edn. Addison-Wesley, Reading
Sengupta S, Harris M, Zhang Y, Owens JD (2007) Scan primitives for GPU computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, San Diego, California, pp 97–106
Grand SL (2007) Broad-phase collision detection with CUDA. In: Nguyen H (ed) GPU Gems 3. Addison-Wesley Professional, Reading, pp 697–721
Govindaraju NK, Gray J, Kumar R, Manocha D (2006) GPUTera-Sort: high performance graphics coprocessor sorting for large database management. In: Proceedings of 2006 ACM SIGMOD international conference on management of data, pp 325–336
Press A, Teukolsky S, Vetterling W, Flannery B (2002) Numerical recipes in C: the art of scientific computing. Cambridge University Press, New York
Blum M, Floyd R, Watt V, Rive R, Tarjan R (1973) Time bounds for selection. J Comput Syst Sci 7: 448–461
Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of IEEE international parallel and distributed processing symposium (IPDPS 2009), Rome. doi:10.1109/IPDPS.2009.5161005
Jackson D (1921) Note on the median of a set of numbers. Bull Am Math Soc 27: 160–164
Bullen P (2003) Handbook of means and their inequalities. Kluwer, Dordrecht
Gini C, Le Medie (1958) Unione Tipografico-Editorial Torinese, Milan (Russian translation, Srednie Velichiny, Statistica, Moscow, 1970)
Yager R, Rybalov A (1997) Understanding the median as a fusion operator. Int J Gen Syst 26: 239–263
Calvo T, Mesiar R, Yager R (2004) Quantitative weights and aggregation. IEEE Trans Fuzzy Syst 12: 62–69
Calvo T, Beliakov G (2010) Aggregation functions based on penalties. Fuzzy Sets Syst 161: 1420–1436
Bagirov A (2002) A method for minimization of quasidifferentiable functions. Optim Methods Softw 17: 31–60
Kelley J (1960) The cutting-plane method for solving convex programs. J SIAM 8: 703–712
Demyanov V, Rubinov A (1995) Constructive nonsmooth analysis. Peter Lang, Frankfurt am Main
Govindaraju NK, Lloyd B, Wang W, Lin M, Manocha D (2004) Fast computation of database operations using graphic processors. In: Proceedings of 2004 ACM SIGMOD International Conference on Management of Data, pp 215–226
NVIDIA (2011) http://developer.download.nvidia.com/compute/cuda/1_1/website/data-parallel_algorithms.html. Accessed 1 February
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Beliakov, G., Johnstone, M. & Nahavandi, S. Computing of high breakdown regression estimators without sorting on graphics processing units. Computing 94, 433–447 (2012). https://doi.org/10.1007/s00607-011-0183-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-011-0183-7