Abstract
Parallel computers differ from conventional serial computers in that they can, in a variety of ways, perform more than one operation at a time. Parallel processing, the application of parallel computers, has been successfully utilized in many fields of science and technology. The purpose of this paper is to review efforts to use parallel processing for statistical computing. We present some technical background, followed by a review of the literature that relates parallel computing to statistics. The review material focuses explicitly on statistical methods and applications, rather than on conventional mathematical techniques. Thus, most of the review material is drawn from statistics publications. We conclude by discussing the nature of the review material and considering some possibilities for the future.
Similar content being viewed by others
References
Akl, S. (1985) Parallel Sorting Algorithms. Academic Press, New York.
Al-Jumeily, D. M., Clegg, D. B., Pountney, D. C. and Harris, P. (1994) Optimising Simple Statistical Calculations Using Memory Computers. No. CMS 5, School of Computing and Mathematical Sciences, Liverpool John Moores University.
Anderson, S. L. (1990) Random number generators on vector computers and other advanced architectures. SIAM Review, 32(2), 221–51.
Bäck, T. and Hoffmeister, F. (1994) Basic aspects of evolution strategies. Statistics and Computing, 4, 51–63.
Bailey, D. H. (1991) Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputer, 8(5), 4–7.
Bertsekas, D. P. and Tsitsiklis, J. N. (1989) Parallel and Distributed Computation, Prentice-Hall, Englewood Cliffs, NJ.
Brophy, J. F., Gentle, J. E., Li, J. and Smith, P. W. (1989) Software for advanced architecture computers. In K. Berk and L. Malone (eds), Computer Science and Statistics, Proceedings of the 21st Symposium on the Interface, pp. 116–20. American Statistical Association.
Carriero, N. and Gelernter, D. (1989) LINDA in context. Communications of the ACM, 32(4), 444–58.
Chambers, J. M. (1977) Computational Methods for Data Analysis. Wiley, New York.
Cray-1 Computer Systems (1981) Fortran (CFT) reference manual. Publication No. SR-0009, Rev. H.
de Doncker, E. and Kapenga, J. (1989) Parallel multivariate numerical integration. In G. Rodrigue (ed.), Parallel Processing for Scientific Computing, pp. 109–13. SIAM, Philadelphia.
de Doncker, E. and Vakalis, I. (1993) Convergence results and speedup of parallel numerical integration algorithms. In R. F. Sincovec, D. E. Keys, M. R. Leuze, L. R. Petzold and D. A. Reed (eds), Parallel Processing for Scientific Computing, Vol. 2, pp. 539–45. SIAM, Philadelphia.
de Doncker, E., Kapenga, J. A., and McKean, J. W. (1989) Robust projection pursuit. In K. Berk and L. Malone (eds). Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 308–13. American Statistical Association.
Dongarra, J. J. and Sorenson, D. C. (1987) A portable environment for developing parallel Fortran programs. Parallel Computing, 5, 139–54.
Dongarra, J. J. and Tourancheau, B. (1992) Environments and Tools for Parallel Scientific Computing. North-Holland, Amsterdam.
Dongarra, J. J., Duff, I. S., Sorenson, D. C. and van der Vorst, H. A. (1991) Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia.
Du Croz, J. (1990) Supercomputing with the NAG Library. Supercomputer, 7(2), 72–80.
Durst, M. J. (1987) Library software in the supercomputing environment. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 7–12. American Statistical Association.
Eddy, W. F. (1986) Parallel architecture: a tutorial for statisticians. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 23–9. American Statistical Association.
Eddy, W. F. and Schervish, M. J. (1986) Discrete-finite inference on a network of Vaxes. In T. M. Boardman and I. M. Stefanski, (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 30–6. American Statistical Association.
Eddy, W. F., Meyer, M. M., Mockus, A., Schervish, M. J., Tan, K. and Viele, K. (1992) Smoothing census adjustment factors: an application of high performance computing. In H. J. Newton (ed.), Computing Science and Statistics, Proceedings of the 24th Symposium on the Interface, pp. 503/2-10. American Statistical Association.
Eddy, W. F. and Schervish, M. J. (1991) Parallel computing—a tutorial for statisticians. In E. M. Keramidas (ed.), Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface, pp. 479–86. Interface Foundation North America.
Efron, B. and Tibshirani, R. J. (1993) An Introduction to the Bootstrap. Chapman and Hall, London.
Encore (1988) Encore Parallel Fortran, Ref. No. 724–06785, Encore Computer Corporation, Fort Lauderdale, FL.
Fahrmeir, L. (1977) Parallel estimation algorithms for stochastic parameters of time series models. In L. Feilmeier (ed.) Parallel Computers—Parallel Mathematics, pp. 99–102. North-Holland, Amsterdam.
Flynn, M. J. (1972) Some computer organisations and their effectiveness. IEEE Transactions on Computers, 21(9), 948–60.
Freeman, T. L. and Philips, C. (1992) Parallel Numerical Algorithms. Prentice-Hall, Englewood Cliffs, NJ.
Freisleben, B. (1993) Parallel learning algorithms for principal component extraction. In Proceedings of the 3rd International Conference on Artificial Neural Networks, 372, 267–71.
Furnival, G. M. and Wilson, R. W. Jr. (1974) Regression by leaps and bounds. Technometrics, 16, 299–511.
Geist, A., Beguelin, A., Dongarra, J., Weichang, J., Manchek, R. and Sunderam, V. (1993) PVM 3·0 User's Guide and Reference Manual. Tech. Rept. ORNL/TM-12187, Oak Ridge National Laboratory.
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. and Sunderam, V. (1994) PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA. (also available online http:// www.netlib.org/pvm3/book/pvm-book.html).
Gladwell, I. (1987) Vectorisation of one dimensional quadrature codes. In G. Fairweather and P. M. Keast (eds), Numerical Integration. Recent Developments, Software and Applications, NATO ASI Series C203, pp. 230–8.
Golub, G. and Ortega, J. M. (1993) Scientific Computing an Introduction with Parallel Computing. Academic Press, New York.
Gonzalez, C., Chen, J. and Sarma, J. (1988) A tool to generate FORTRAN parallel code for the Intel IPSC/2 Hypercube. In E. J, Wegman, D. T. Gantz and J. J. Miller (eds). Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 214–9. American Statistical Association.
Grenander, U. and Miller, M. I. (1994) Representation of knowledge in Complex Systems. Journal of the Royal Statistical Society, Series B, 54(4), 549–603.
Havránek, T. and Stratkoš, Z. (1989) On practical experience with parallel processing of linear models. Bulletin of the International Statistical Institute, 53, 105–17.
Hawkins, D. M., Simonoff, J. S. and Stromberg, A. J. (1994) Distributing a computationally intensive estimator: the case of exact LMS regression. Computational Statistics, 9, 83–95.
Healey, A. R. and Davies, S. T. (1983) Statistical model fitting on the ICL distributed array processors. In M. Feilmeier, J. Joubert and U. Schendel (eds), Parallel Computing '83 pp. 311–17, Elsevier, Amsterdam.
Hénaff, P. J. and Norman, A. L. (1987) Solving nonlinear econometric models using vector processors. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 348–51. American Statistical Association.
Hockney, R. W. and Jesshope, C. R. (1988) Parallel Computers 2. Adam Hilger, Bristol.
Huber, P. J. (1985) Projection pursuit. Annals of Statistics, 13, 435–525.
Hwang, K. (1993) Advanced Computer Architecture: Parallelism. Scalability, Programmability. McGraw-Hill, New York.
Ihnen, L. (1989) Vectorisation of the SAS(R) System. In K. Berk and L. Malone (eds), Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 121–7. American Statistical Association.
Inmos (1990) Transputer Development System (2nd edn.). Prentice-Hall, Englewood Cliffs, NJ.
Jaeckel, L. A. (1972) Estimating regression coefficients by minimising the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–58.
Kapenga, J. A. and McKean, J. W. (1987) The vectorisation of algorithms for R-estimates in linear regression. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 502–5. American Statistical Association.
Kaufman, L. and Rousseeuw, P. J. (1986) Clustering large data sets. In E. Gelsema and L. Kanal (eds), Pattern Recognition in Practice II, pp. 425–37. Elsevier/North-Holland, Amsterdam.
Kaufman, L., Hopke, P. K. and Rousseeuw, P. J. (1988) Using a parallel computer system for statistical resampling methods. Computational Statistics Quarterly, 2, 129–41.
Kaufmann, W. J. and Smarr, L. L. (1993) Supercomputing and the Transformation of Science. Scientific American Library.
Kleijnen, J. P. C. (1990) Supercomputers for Monte Carlo Simulation: Cross-validation versus Rao's test in multivariate analysis. In K. H. Jockes, G. Rothe and W. Sendler (eds), Bootstrapping and Related Techniques, pp. 233–45. Springer-Verlag, Berlin.
Kleijnen, J. P. and Annink, B. (1992) Vector computers, Monte Carlo simulation and regression analysis: an introduction. Management Science, 38(2), 170–81.
Lafaye de Micheaux, D. (1984) Parallelization of algorithms in the practice of statistical data. In T. Havránek, Z. Sidak and M. Novak (eds), COMPSTAT '84—Proceedings in Computational Statistics, pp. 293–300. Vienna.
Lewis, T. G. and El-Rewini, H. (1992) Introduction to Parallel Processing, Prentice-Hall, Englewood Cliffs, NJ.
Lootsma, F. A. (1989) Parallel Non Linear Optimisation. No. 89-45 Faculty of Tech. Math. and Informatics, Delft University of Tech.
Lootsma, F. A. and Ragsdell, K. M. (1988) State-of-the-art in parallel nonlinear optimisation. Parallel Computing, 6, 133–55.
Malfait, M., Roose, D. and Vandermeulen, D. (1993) A convergence measure and some parallel aspects of Markov chain Monte Carlo algorithms. In Su-Shing Chen (ed.), Neural and Stochastic Methods in Image and Signal Processing, Proc. SPIE 2032, 23–34.
McCullagh, P. and Nelder, J. A. (1983) Generalised Linear Models. Chapman and Hall, London.
McKean, J.W. and Hettmansperger, T. P. (1978) A robust analysis of the general linear model based on one step R-estimates. Biometrika, 65, 571–9.
Mitchell, T. J. and Beauchamp, J. J. (1986) Algorithms for Bayesian variable selection in regression. In T. M. Boardman (ed.), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 181–2. American Statistical Association.
Mitchell, T. J. and Morris, M. D. (1988) A Bayesian approach to the design and analysis of computational experiments. In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 49–51. American Statistical Association.
Modi, J. J. (1988). Parallel Algorithms for Matrix Computations. Clarendon Press, Oxford.
O'Sullivan, F. and Pawitan, Y. (1993) Multidimensional density estimation by tomography. Journal of the Royal Statistical Society, Series B, 55(2), 509–21.
Ortega, J. M., Voigt, R. G. and Romine, C. H. (1990) A bibliography on parallel and vector numerical algorithms. In K.A. Gallivan, M. T. Heath, E. Ng, et al.Parallel Algorithms for Matrix Computations, pp. 125–97. SIAM, Philadelphia.
Ostrouchov, G. (1987) Parallel computing on a hypercube: an overview of the architecture and some applications. In R. M. Heiberger (ed.), Computer Science and Statistics, Proceedings of the 19th Symposium on the Interface, pp. 27–32. American Statistical Association.
Perrott, R. H. (1987) Parallel Programming. Addison-Wesley, Reading, MA.
Quinn, M. J. (1987) Designing Efficient Algorithms for Parallel Computers. McGraw-Hill, New York.
Raphalen, M. (1982) Applying parallel processing to data analysis: computing a distance's matrix on an SIMD machine. In H. Caussinus, P. Ettinger and R. Tomassone (eds), COMPSTAT '82—Proceedings in Computational Statistics, pp. 382–6. Physica-Verlag, Vienna.
Rousseeuw, P. J. (1984) Least median of squares regression. Journal of the American Statistical Association, 79, 871–80.
Schervish, M. J. (1988) Applications of parallel computation to statistical inference. Journal of the American Statistical Association, 83(404), 976–83.
Schervish, M. J. and Tsay, R. S. (1988) Bayesian modelling and forecasting in large scale time series. In J. C. Spall (ed.), Bayesian Analysis of Time Series and Dynamic Models, pp. 23–52. Marcel Dekker, New York.
Schnabel, R. B. (1988) Sequential and parallel methods for unconstrained optimization. Tech. Rept. CU-CS-414-88, Dept. of Comput. Sci., University of Colorado at Boulder, CO.
Schork, N. J. and Hardwick, J. (1990) Supercomputer-intensive multivariable randomization tests. In C. Page and R. LePage (eds), Computing Science and Statistics, Proceedings of the 22nd Symposium on the Interface, pp. 509–13. Springer-Verlag, New York.
Skvoretz, J., Smith, S. A. and Baldwin, C. (1992) Parallel processing applications for data analysis in the social sciences. Concurrency: Practice and Experience, 4(3), 207–21.
Stewart, G. W. (1986) Communication in parallel algorithms: an example. In T. M. Boardman and I. M. Stefanski (eds), Computer Science and Statistics, Proceedings of the 18th Symposium on the Interface, pp. 11–14. American Statistical Association.
Stewart, G. W. (1988) Parallel linear algebra in statistical computations. In D. Edwards and N. E. Raun (eds), COMPSTAT '88, Proceedings in Computational Statistics, pp. 3–14. Physica-Verlag, Vienna.
Stine, R. A. and Woteki, T. H. (1989) A graphical programming environment for statistical simulations with parallel processing. In ASA Proceedings of the Statistical Computing Section, pp. 104–9. American Statistical Association.
Stratkoš, Z. (1987) Effectivity and optimizing algorithms and programs on the host-computer/array processor systems. Parallel Computing 4, 197–207.
Sylwestrowicz, J. D. (1982) Parallel processing in statistics. In H. Caussinus, P. Ettinger and R. Tomassone (eds), COMPSTAT '82—Proceedings in Computational Statistics, pp. 131–6. Physica-Verlag, Vienna.
Thisted, R. A. (1988) Elements of Statistical Computing. Chapman and Hall, London.
Wilson, G. V. (1993) A glossary of parallel computing terminology. IEEE Parallel and Distributed Terminology, February, pp. 52–67.
Wollan, P. (1988) All-subsets regression on a hypercube computer. In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer Science and Statistics. Proceedings of the 20th Symposium on the Interface, pp. 224–7. American Statistical Association.
Xu, C. W. and Shiue, W. K. (1991) Parallel bootstrap and inference for means. Computational Statistics Quarterly, 3, 233–9.
Xu, C. W. and Shiue, W. K. (1993) Parallel algorithms for least median of squares regression. Computational Statistics and Data Analysis, 16, 349–62.
Xu, M., Miller, J. J. and Wegman, E. J. (1989) Parallelizing mutiple linear regression for speed and redundancy: an empirical study. In K. Berk and L. Malone (eds), Computer Science and Statistics. Proceedings of the 21st Symposium on the Interface, pp. 138–44. American Statistical Association.
Zenios, S. A. (1989) Parallel numerical optimization: current status and an annotated bibliography. Operational Research Society of America Journal of Computing, 1, 20–43.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Adams, N.M., Kirby, S.P.J., Harris, P. et al. A review of parallel processing for statistical computation. Stat Comput 6, 37–49 (1996). https://doi.org/10.1007/BF00161572
Issue Date:
DOI: https://doi.org/10.1007/BF00161572