Skip to main content
Log in

Optimal predictive partitioning

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In many situations, one wishes to group objects into well-defined classes on the basis of one set of descriptor variables, and then predict the classes of new objects from a different set of variables. For example, a bank may categorise customers into distinct financial behaviour pattern classes by observing how they have behaved over a period of years, and then seek to assign new customers to future behaviour classes using information captured when they open an account. Such situations require the striking of a compromise between the compactness and integrity of the cluster structure, and the accuracy of the predictive assignment to clusters. We describe two algorithms for achieving such a compromise, discuss some of their features, and illustrate their performance in a simulation study and in a liver transplant problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arabie P., Hubert, L.J., and DeSoete G. 1996. Clustering and Classification. Singapore, World Scientific.

    MATH  Google Scholar 

  • Banfield C.F. and Bassill L.C. 1977. A transfer algorithm for non-hierarchical classification. Algorithm AS113: Applied Statistics 26: 206–210.

    Article  Google Scholar 

  • Benton T.C. and Hand D.J. 2002. Segmentation into predictable classes. IMA Journal of Management Mathematics 13: 245–259.

    Article  MATH  MathSciNet  Google Scholar 

  • Bock H.H. 1987. On the interface between cluster analysis, principal component analysis and multidimensional scaling. In: Bozdogan H. and Gupta A. K. (Eds.), Multivariate Statistical Modeling and Data Analysis. Dordrecht, Reidel, pp. 17–34.

    Google Scholar 

  • Bolton R.J. and Krzanowski W.J. 2003. Projection pursuit clustering for exploratory data analysis. Journal of Computational and Graphical Statistics 12: 121–142.

    Article  MathSciNet  Google Scholar 

  • Everitt B.S., Landau S., and Leese M. 2001. Cluster Analysis (4th Ed). London, Arnold.

    Google Scholar 

  • Forgey E.W. 1965. Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics, 21: 768–769.

    Google Scholar 

  • Friedman J.H. and Meulman J.J. 2004. Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society Series B 66: 815–849.

    Google Scholar 

  • Gordon A.D. 1999. Classification (2nd edn). Boca Raton, Chapman & Hall/CRC.

    MATH  Google Scholar 

  • Gower J.C. 1974. Maximal predictive classification. Biometrics, 30: 643–654.

    Article  MATH  Google Scholar 

  • Hand D.J. 1997. Construction and Assessment of Classification Rules. Chichester, John Wiley & Sons.

    MATH  Google Scholar 

  • Hand D.J., Li H.G., and Adams N.M. 2001. Supervised classification with structured class definitions. Computational Statistics and Data Analysis 36: 209–225.

    Article  MATH  MathSciNet  Google Scholar 

  • Hand D.J., Oliver J.J., and Lunn A.D. 1998. Discriminant analysis when the classes arise from a continuum. Pattern Recognition 31: 641–650.

    Article  Google Scholar 

  • Hartigan J.A. and Wong M.A. 1979. A k-means clustering algorithm. Algorithm AS136 Applied Statistics, 28: 100–108.

    Article  MATH  Google Scholar 

  • Kelly M.G., Hand D.J., and Adams N.M. 1998. Defining the goals to optimise data mining performance. In: Agrawal R., Stolorz P., and Piatetsky-Shapiro G. (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, Menlo Park, AAAI Press, pp. 234–238.

  • Kelly M.G. and Hand D.J. 1999. Credit scoring with uncertain class definitions. IMA Journal of Mathematics Applied in Business and Industry, 10: 331–345.

    MATH  Google Scholar 

  • Kelly M.G., Hand D.J., and Adams N.M. 1999. Supervised classification problems: how to be both judge and jury. In: Hand D.J., Kok J.N., and Berthold M.R. (Eds.), Advances in Intelligent Data Analysis Berlin, Springer, pp. 235–244.

    Google Scholar 

  • Krzanowski W.J. and Marriott F.H.C. 1995. Multivariate Analysis, part 2: Classification, Covariance Structures, and Repeated Measurements. London, Arnold.

    MATH  Google Scholar 

  • Lewis E.M. 1994. An Introduction to Credit Scoring. San Rafael, California, Athena Press.

    Google Scholar 

  • MacQueen J. 1967. Some methods for classification and analysis of multivariate observations. In: LeCam L. and Neyman J., (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, University of California Press, Vol. 1, pp. 281–297.

    Google Scholar 

  • McLachlan G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. New York, John Wiley & Sons.

    Book  Google Scholar 

  • Ward, J.H. 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58: 236–244.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wojtek J. Krzanowski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hand, D.J., Krzanowski, W.J. & Crowder, M.J. Optimal predictive partitioning. Stat Comput 17, 11–21 (2007). https://doi.org/10.1007/s11222-006-9003-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-006-9003-x

Keywords

Navigation