Abstract
Two very effective data-based procedures which are simple and fast to compute are proposed for selecting the number of bins in a histogram. The idea is to choose the number of bins that minimizes the circumference (or a bootstrap estimate of the expected circumference) of the frequency histogram. Contrary to most rules derived in the literature, our method is therefore not dependent on precise asymptotic analyses. It is shown by means of an extensive Monte-Carlo study that our selectors perform well in comparison with recently suggested selectors in the literature, for a wide range of density functions and sample sizes. The behaviour of one of the proposed rules is also illustrated on real data sets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berlinet, A. and Devroye, L. (1994) A comparison of kernel density estimates. Publications de L'Institut de Statistique de L'Universite de Paris, 38, 3-59.
De Jager, O. C. (1994) On periodicity tests and flux limit calcu-lations for gamma-ray pulsars. The Astrophysical Journal, 436, 239-248.
Devroye, L. (1989) The double kernel method in density estima-tion. Annales de L'Institut Henri Poincaré, 25, 533-580.
Devroye, L. and Györfi, L. (1985) Nonparametric density esti-mation: The L 1 view. John Wiley & Sons, New York.
Efron, B. (1979) Bootstrap methods: Another look at the jack-knife. The Annals of Statistics, 7, 1-26.
Hall, P. and Wand, M. P. (1988) Minimizing L1 distance in nonparametric density estimation. Journal of Multivariate Analysis, 26, 59-88.
He, K. and Meeden, G. (1997) Selecting the number of bins in a histogram: A decision theoretical approach. Journal of Sta-tistical Planning and Inference, 61, 49-59.
Janssen, P., Marron, J. S., Veraverbeke, N. and Sarle, W. (1995) Scale measures for bandwidth selection. Journal of Non-parametric Statistics, 5, 359-380.
Marron, J. S. and Wand, M. P. (1992) Exact mean integrated squared error. The Annals of Statistics, 20, 712-736.
Mayer-Hasselwander, et al: (1994) High energy gamma radiation from Geminga observed by EGRET. The Astrophysical Journal, 421, 276-283.
Scott, D. W. (1979) On optimal and data-based histograms. Biometrika, 66, 605-610.
Scott, D. W. (1992) Multivariate density estimation-Theory, practice and visualization, John Wiley & Sons, New York.
Simonoff, J. S. and Udina, F. (1997) Measuring the stability of histogram appearance when the anchor position is changed. Computational Statistics and Data Analysis, 23, 335-353.
Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association, 21, 65-66.
Terrell, G. R. (1990) The maximal smoothing principal in density estimation. Journal of the American Statistical Association, 85, 470-477.
Terrell, G. R. and Scott, D. W. (1985) Oversmoothed nonpara-metric density estimates. Journal of the American Statistical Association, 80, 209-214.
Wand, M. P. (1997) Data-based choice of histogram bin width. The American Statistician, 51, 59-64.
Rights and permissions
About this article
Cite this article
Beer, C.F.D., Swanepoel, J.W.H. Simple and effective number-of-bins circumference selectors for a histogram. Statistics and Computing 9, 27–35 (1999). https://doi.org/10.1023/A:1008858025515
Issue Date:
DOI: https://doi.org/10.1023/A:1008858025515