Skip to main content
Log in

Bump hunting in high-dimensional data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In addition it is usually desired that these regions be describable in an interpretable form involving simple statements (“rules”) concerning the input values. This paper presents a procedure directed towards this goal based on the notion of “patient” rule induction. This patient strategy is contrasted with the greedy ones used by most rule induction methods, and semi-greedy ones used by some partitioning tree techniques such as CART. Applications involving scientific and commercial data bases are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Barnett, V. (1976) The ordering of multivariate data (with dis-cussion). J. Roy. Statist. Soc., A 139, 318–354.

    Google Scholar 

  • Bishop, C. M. (1995) Neural Networks for Pattern Recognition. Oxford University Press.

  • Breiman, L. (1996) Bagging predictors. Machine Learning, 24, 123–140.

    Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.

  • Clark, P. and Niblett, R. (1989) The CN2 induction algorithm. Machine Learning, 3, 261–284.

    Google Scholar 

  • Cohen W. W. (1995) Fast efficient rule induction. In Machine Learning: Proceedings of the Twelfth International Confer-ence, Lake Tahoe, CA (115–123). Morgan-Kaufmann.

  • Donoho, D. and Gasko, M. (1992) Breakdown properties of lo-cation estimates based on halfspace depth and projected outlyingness. Annals of Statistics, 20, 1803–1827.

    Google Scholar 

  • Efron, B. and Tibshirani, R. J. (1993) An Introduction to the Bootstrap, Chapman and Hall.

  • Friedman, J. H. (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.

    Google Scholar 

  • Green, P. J. (1981) Peeling bivariate data. In Interpreting Multi-variate Data (V. Barnett, ed.) Wiley.

  • Griffin, W. L., Fisher, N. I., Friedman, J. H., Ryan, C. G., and O'Reilly, S. (1999) Cr-Pyrope garnets in lithospheric mantle. J. Petrology to appear.

  • Hall, P. (1989) On projection pursuit regression. Annals of Sta-tistics, 17, 573–588.

    Google Scholar 

  • Lorentz, G. G. (1986) Approximation of Functions. Chelsea.

  • Mitchell, T. M. (1997) Machine Learning. McGraw-Hill.

  • Quinlan, J. R. (1990) Learning logical definitions from relations. Machine Learning, 5, 239–266.

    Google Scholar 

  • Quinlan, J. R. (1994) C4.5: Programs for Machine Learning. Morgan-Kaufmann.

  • Quinlan, J. R. (1995) MDL and categorical theories (continued). In Machine Learning: Proceedings of the Twelfth Interna-tional Conference, Lake Tahoe, CA (464–470). Morgan-Kaufmann.

  • Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.

  • Rivest, R. L. (1987) Learning decision lists. Machine Learning, 2, 229–246.

    Google Scholar 

  • Tibshirani, R. J. and Knight, K. (1995) Model search and infer-ence by bootstrap “bumping”. Technical Report, University of Toronto.

  • Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer.

  • Wahba, G. (1990) Spline Models for Observational Data. SIAM.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedman, J.H., Fisher, N.I. Bump hunting in high-dimensional data. Statistics and Computing 9, 123–143 (1999). https://doi.org/10.1023/A:1008894516817

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008894516817

Navigation