ABSTRACT
The impact of computer science on statistical computing, while important, has not been as great as it should. Part of this is caused by statisticians being unaware of relevant research in computer science. A related, and possibly more serious problem, is that computer scientists are not aware of some of the interesting problems in statistical computing. This paper will discuss several such problems. For example, statisticians are more concerned about “nice” case behavior than “worst” case behavior, e.g., so their primary interest is in expected running time using realistic models for typical data sets. Improvements in algorithms can often be made in “nice” cases with little or no sacrifice in worst case behavior, by being optimistic as well as pessimistic in the design of algorithms. Statisticians are often willing to accept approximate solutions or solutions which are asymptotically equivalent to the correct solution. A second example is the behavior of algorithms when data are stored in a virtual memory or auxillary memory. While some theoretical work has been done, only results on sorting and a few matrix operations have had an impact on statistical computing. There also seems to be a lack of models which give general predictions about the behavior of programs using virtual or auxillary memory, especially for portable programs which are not designed for specific page sizes. A third example is the problem of explaining issues of numerical accuracy to users of statitsical programs, e.g., if an ill-conditioned problem is diagnosed, it is necessary to explain to users what is wrong with their data. This might require finding a statistically meaningful index related to condition numbers. A fourth example is in language design for statistical packages. These languages do not need to deal with complex flow of control that programming languages do, On the other hand, they must be designed to be useful for the occasional and novice user. As a final example, while the developing technology of software engineering is having a significant influence on the writing of statistical software (especially statistical packages) there are many questions which remain to be answered, especially concerning portability of large applications programs.
Index Terms
- Statistics and computer science: Problems in statistical computing of interest to computer scientists
Recommendations
Statistics and computer science: Recent development in BMDP computing algorithms
ACM '78: Proceedings of the 1978 annual conference - Volume 2BMDP is a comprehensive library of statistical computer programs integrated by a common English-based control language and self-documented save tiles for data and results. The major emphasis in the package is on providing software for the analysis of ...
Multivariate U-statistics: a tutorial with applications
U-statistics represent an important class of statistics arising from modeling quantities of interest defined by multi-subject responses such as the classic Mann-Whitney-Wilcoxon rank tests. However, classic applications of U-statistics are largely ...
Comments